Skip to main content

Showing 1–50 of 130 results for author: Kepner, J

  1. arXiv:2407.01481  [pdf, other

    cs.DC cs.PF

    LLload: Simplifying Real-Time Job Monitoring for HPC Users

    Authors: Chansup Byun, Julia Mullen, Albert Reuther, William Arcand, William Bergeron, David Bestor, Daniel Burrill, Vijay Gadepally, Michael Houle, Matthew Hubbell, Hayden Jananthan, Michael Jones, Peter Michaleas, Guillermo Morales, Andrew Prout, Antonio Rosa, Charles Yee, Jeremy Kepner, Lauren Milechin

    Abstract: One of the more complex tasks for researchers using HPC systems is performance monitoring and tuning of their applications. Developing a practice of continuous performance improvement, both for speed-up and efficient use of resources is essential to the long term success of both the HPC practitioner and the research project. Profiling tools provide a nice view of the performance of an application… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2404.14643  [pdf, other

    cs.CR cs.CY cs.GR cs.NI cs.SI

    Teaching Network Traffic Matrices in an Interactive Game Environment

    Authors: Chasen Milner, Hayden Jananthan, Jeremy Kepner, Vijay Gadepally, Michael Jones, Peter Michaleas, Ritesh Patel, Sandeep Pisharody, Gabriel Wachman, Alex Pentland

    Abstract: The Internet has become a critical domain for modern society that requires ongoing efforts for its improvement and protection. Network traffic matrices are a powerful tool for understanding and analyzing networks and are broadly taught in online graph theory educational resources. Network traffic matrix concepts are rarely available in online computer network and cybersecurity educational resource… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 9 pages, 10 figures, 52 references; accepted to IEEE GrAPL

  3. arXiv:2311.03609  [pdf, other

    cs.LG

    Testing RadiX-Nets: Advances in Viable Sparse Topologies

    Authors: Kevin Kwak, Zack West, Hayden Jananthan, Jeremy Kepner

    Abstract: The exponential growth of data has sparked computational demands on ML research and industry use. Sparsification of hyper-parametrized deep neural networks (DNNs) creates simpler representations of complex data. Past research has shown that some sparse networks achieve similar performance as dense ones, reducing runtime and storage. RadiX-Nets, a subgroup of sparse DNNs, maintain uniformity which… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: 5 pages, 8 figures, accepted to IEEE URTC 2023

  4. arXiv:2311.03574  [pdf, ps, other

    cs.DB

    Fuzzy Relational Databases via Associative Arrays

    Authors: Kevin Min, Hayden Jananthan, Jeremy Kepner

    Abstract: The increasing rise in artificial intelligence has made the use of imprecise language in computer programs like ChatGPT more prominent. Fuzzy logic addresses this form of imprecise language by introducing the concept of fuzzy sets, where elements belong to the set with a certain membership value (called the fuzzy value). This paper combines fuzzy data with relational algebra to provide the mathema… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: 5 pages, accepted to IEEE URTC 2023

  5. arXiv:2311.03562  [pdf, other

    cs.SI

    From Bits to Insights: Exploring Network Traffic, Traffic Matrices, and Heavy-Tailed Data

    Authors: Christopher Howard, Hayden Jananthan, Jeremy Kepner

    Abstract: With the Internet a central component of modern society, entire industries and fields have developed both in support and against cybersecurity. For cyber operators to best understand their networks, they must conduct detailed traffic analyses. A growing recognition is the ubiquity of heavy-tailed characteristics in network traffic. However, a thorough analysis of cybersecurity programs suggests li… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: 5 pages, 5 figures, accepted to IEEE URTC 2023

  6. arXiv:2311.03559  [pdf, other

    cs.DM

    Algebraic Conditions on One-Step Breadth-First Search

    Authors: Emma Fu, Hayden Jananthan, Jeremy Kepner

    Abstract: The GraphBLAS community has demonstrated the power of linear algebra-leveraged graph algorithms, such as matrix-vector products for breadth-first search (BFS) traversals. This paper investigates the algebraic conditions needed for such computations when working with directed hypergraphs, represented by incidence arrays with entries from an arbitrary value set with binary addition and multiplicatio… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: 5 pages, 2 figures, accepted to IEEE URTC 2023

  7. arXiv:2310.18334  [pdf, other

    cs.AR cs.DC

    Hypersparse Traffic Matrix Construction using GraphBLAS on a DPU

    Authors: William Bergeron, Michael Jones, Chase Barber, Kale DeYoung, George Amariucai, Kaleb Ernst, Nathan Fleming, Peter Michaleas, Sandeep Pisharody, Nathan Wells, Antonio Rosa, Eugene Vasserman, Jeremy Kepner

    Abstract: Low-power small form factor data processing units (DPUs) enable offloading and acceleration of a broad range of networking and security services. DPUs have accelerated the transition to programmable networking by enabling the replacement of FPGAs/ASICs in a wide range of network oriented devices. The GraphBLAS sparse matrix graph open standard math library is well-suited for constructing anonymize… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  8. arXiv:2310.09145  [pdf, other

    cs.AI cs.DC

    Lincoln AI Computing Survey (LAICS) Update

    Authors: Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, Jeremy Kepner

    Abstract: This paper is an update of the survey of AI accelerators and processors from past four years, which is now called the Lincoln AI Computing Survey - LAICS (pronounced "lace"). As in past years, this paper collects and summarizes the current commercial accelerators that have been publicly announced with peak performance and peak power consumption numbers. The performance and power values are plotted… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: 7 pages, 6 figures, 2023 IEEE High Performance Extreme Computing (HPEC) conference, September 2023

    ACM Class: C.1.4; C.4

  9. arXiv:2310.03003  [pdf, other

    cs.CL cs.DC

    From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference

    Authors: Siddharth Samsi, Dan Zhao, Joseph McDonald, Baolin Li, Adam Michaleas, Michael Jones, William Bergeron, Jeremy Kepner, Devesh Tiwari, Vijay Gadepally

    Abstract: Large language models (LLMs) have exploded in popularity due to their new generative capabilities that go far beyond prior state-of-the-art. These technologies are increasingly being leveraged in various domains such as law, finance, and medicine. However, these models carry significant computational challenges, especially the compute and energy costs required for inference. Inference energy costs… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  10. arXiv:2310.00522  [pdf, other

    cs.SI

    Mapping of Internet "Coastlines" via Large Scale Anonymized Network Source Correlations

    Authors: Hayden Jananthan, Jeremy Kepner, Michael Jones, William Arcand, David Bestor, William Bergeron, Chansup Byun, Timothy Davis, Vijay Gadepally, Daniel Grant, Michael Houle, Matthew Hubbell, Anna Klein, Lauren Milechin, Guillermo Morales, Andrew Morris, Julie Mullen, Ritesh Patel, Alex Pentland, Sandeep Pisharody, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Tyler Trigg , et al. (3 additional authors not shown)

    Abstract: Expanding the scientific tools available to protect computer networks can be aided by a deeper understanding of the underlying statistical distributions of network traffic and their potential geometric interpretations. Analyses of large scale network observations provide a unique window into studying those underlying statistics. Newly developed GraphBLAS hypersparse matrices and D4M associative ar… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

    Comments: 9 pages, 7 figures, IEEE HPEC 2023 (accepted)

  11. pPython Performance Study

    Authors: Chansup Byun, William Arcand, David Bestor, Bill Bergeron, Vijay Gadepally, Michael Houle, Matthew Hubbell, Hayden Jananthan, Michael Jones, Anna Klein, Peter Michaleas, Lauren Milechin, Guillermo Morales, Julie Mullen, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Charles Yee, Jeremy Kepner

    Abstract: pPython seeks to provide a parallel capability that provides good speed-up without sacrificing the ease of programming in Python by implementing partitioned global array semantics (PGAS) on top of a simple file-based messaging library (PythonMPI) in pure Python. pPython follows a SPMD (single program multiple data) model of computation. pPython runs on a single-node (e.g., a laptop) running Window… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2208.14908

  12. Deployment of Real-Time Network Traffic Analysis using GraphBLAS Hypersparse Matrices and D4M Associative Arrays

    Authors: Michael Jones, Jeremy Kepner, Andrew Prout, Timothy Davis, William Arcand, David Bestor, William Bergeron, Chansup Byun, Vijay Gadepally, Micheal Houle, Matthew Hubbell, Hayden Jananthan, Anna Klein, Lauren Milechin, Guillermo Morales, Julie Mullen, Ritesh Patel, Sandeep Pisharody, Albert Reuther, Antonio Rosa, Siddharth Samsi, Charles Yee, Peter Michaleas

    Abstract: Matrix/array analysis of networks can provide significant insight into their behavior and aid in their operation and protection. Prior work has demonstrated the analytic, performance, and compression capabilities of GraphBLAS (graphblas.org) hypersparse matrices and D4M (d4m.mit.edu) associative arrays (a mathematical superset of matrices). Obtaining the benefits of these capabilities requires int… ▽ More

    Submitted 8 December, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

    Comments: Accepted to IEEE HPEC, 8 pages, 8 figures, 1 table, 69 references. arXiv admin note: text overlap with arXiv:2203.13934. text overlap with arXiv:2309.01806

  13. Focusing and Calibration of Large Scale Network Sensors using GraphBLAS Anonymized Hypersparse Matrices

    Authors: Jeremy Kepner, Michael Jones, Phil Dykstra, Chansup Byun, Timothy Davis, Hayden Jananthan, William Arcand, David Bestor, William Bergeron, Vijay Gadepally, Micheal Houle, Matthew Hubbell, Anna Klein, Lauren Milechin, Guillermo Morales, Julie Mullen, Ritesh Patel, Alex Pentland, Sandeep Pisharody, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Tyler Trigg, Charles Yee , et al. (1 additional authors not shown)

    Abstract: Defending community-owned cyber space requires community-based efforts. Large-scale network observations that uphold the highest regard for privacy are key to protecting our shared cyberspace. Deployment of the necessary network sensors requires careful sensor placement, focusing, and calibration with significant volumes of network observations. This paper demonstrates novel focusing and calibrati… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: Accepted to IEEE HPEC, 9 pages, 12 figures, 1 table, 63 references, 2 appendices

  14. arXiv:2306.09267  [pdf

    cs.CY cs.AI cs.DL cs.LG cs.SE

    Are ChatGPT and Other Similar Systems the Modern Lernaean Hydras of AI?

    Authors: Dimitrios Ioannidis, Jeremy Kepner, Andrew Bowne, Harriet S. Bryant

    Abstract: The rise of Generative Artificial Intelligence systems ("AI systems") has created unprecedented social engagement. AI code generation systems provide responses (output) to questions or requests by accessing the vast library of open-source code created by developers over the past few decades. However, they do so by allegedly stealing the open-source code stored in virtual libraries, known as reposi… ▽ More

    Submitted 30 January, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: 40 pages, 100+ references, to appear in Fordham Law Review

  15. arXiv:2211.15552  [pdf

    cs.AI

    AI Enabled Maneuver Identification via the Maneuver Identification Challenge

    Authors: Kaira Samuel, Matthew LaRosa, Kyle McAlpin, Morgan Schaefer, Brandon Swenson, Devin Wasilefsky, Yan Wu, Dan Zhao, Jeremy Kepner

    Abstract: Artificial intelligence (AI) has enormous potential to improve Air Force pilot training by providing actionable feedback to pilot trainees on the quality of their maneuvers and enabling instructor-less flying familiarization for early-stage trainees in low-cost simulators. Historically, AI challenges consisting of data, problem descriptions, and example code have been critical to fueling AI breakt… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

    Comments: 10 pages, 7 figures, 4 tables, accepted to and presented at I/ITSEC

  16. AI and ML Accelerator Survey and Trends

    Authors: Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, Jeremy Kepner

    Abstract: This paper updates the survey of AI accelerators and processors from past three years. This paper collects and summarizes the current commercial accelerators that have been publicly announced with peak performance and power consumption numbers. The performance and power values are plotted on a scatter graph, and a number of dimensions and observations from the trends on this plot are again discuss… ▽ More

    Submitted 8 October, 2022; originally announced October 2022.

    Comments: 10 pages, 4 figures, 2022 IEEE High Performance Extreme Computing (HPEC) Conference. arXiv admin note: substantial text overlap with arXiv:2009.00993, arXiv:2109.08957

    ACM Class: C.1.4; C.4

  17. arXiv:2209.05725  [pdf, other

    cs.NI cs.DC

    Hypersparse Network Flow Analysis of Packets with GraphBLAS

    Authors: Tyler Trigg, Chad Meiners, Sandeep Pisharody, Hayden Jananthan, Michael Jones, Adam Michaleas, Timothy Davis, Erik Welch, William Arcand, David Bestor, William Bergeron, Chansup Byun, Vijay Gadepally, Micheal Houle, Matthew Hubbell, Anna Klein, Peter Michaleas, Lauren Milechin, Julie Mullen, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Doug Stetson, Charles Yee , et al. (1 additional authors not shown)

    Abstract: Internet analysis is a major challenge due to the volume and rate of network traffic. In lieu of analyzing traffic as raw packets, network analysts often rely on compressed network flows (netflows) that contain the start time, stop time, source, destination, and number of packets in each direction. However, many traffic analyses benefit from temporal aggregation of multiple simultaneous netflows,… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: arXiv admin note: text overlap with arXiv:2203.13934, arXiv:2108.06653, arXiv:2008.00307

  18. Large Scale Enrichment and Statistical Cyber Characterization of Network Traffic (Enriquecimiento a gran escala y caracterización cibernética estadística del tráfico de red)

    Authors: Ivan Kawaminami, Arminda Estrada, Youssef Elsakkary, Hayden Jananthan, Aydın Buluç, Tim Davis, Daniel Grant, Michael Jones, Chad Meiners, Andrew Morris, Sandeep Pisharody, Jeremy Kepner

    Abstract: Modern network sensors continuously produce enormous quantities of raw data that are beyond the capacity of human analysts. Cross-correlation of network sensors increases this challenge by enriching every network event with additional metadata. These large volumes of enriched network data present opportunities to statistically characterize network traffic and quickly answer a key question: "What a… ▽ More

    Submitted 1 December, 2022; v1 submitted 7 September, 2022; originally announced September 2022.

    Comments: 17 pages, 16 figures, HPEC, Spanish version

  19. Python Implementation of the Dynamic Distributed Dimensional Data Model

    Authors: Hayden Jananthan, Lauren Milechin, Michael Jones, William Arcand, William Bergeron, David Bestor, Chansup Byun, Michael Houle, Matthew Hubbell, Vijay Gadepally, Anna Klein, Peter Michaleas, Guillermo Morales, Julie Mullen, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Charles Yee, Jeremy Kepner

    Abstract: Python has become a standard scientific computing language with fast-growing support of machine learning and data analysis modules, as well as an increasing usage of big data. The Dynamic Distributed Dimensional Data Model (D4M) offers a highly composable, unified data model with strong performance built to handle big data fast and efficiently. In this work we present an implementation of D4M in P… ▽ More

    Submitted 22 November, 2022; v1 submitted 1 September, 2022; originally announced September 2022.

    Comments: 8 pages, 7 figures, accepted to HPEC 2022

  20. pPython for Parallel Python Programming

    Authors: Chansup Byun, William Arcand, David Bestor, Bill Bergeron, Vijay Gadepally, Michael Houle, Matthew Hubbell, Hayden Jananthan, Michael Jones, Kurt Keville, Anna Klein, Peter Michaleas, Lauren Milechin, Guillermo Morales, Julie Mullen, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Charles Yee, Jeremy Kepner

    Abstract: pPython seeks to provide a parallel capability that provides good speed-up without sacrificing the ease of programming in Python by implementing partitioned global array semantics (PGAS) on top of a simple file-based messaging library (PythonMPI) in pure Python. The core data structure in pPython is a distributed numerical array whose distribution onto multiple processors is specified with a map c… ▽ More

    Submitted 31 August, 2022; originally announced August 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:astro-ph/0606464

  21. arXiv:2208.13068  [pdf, other

    cs.DB cs.DC

    Apiary: A DBMS-Integrated Transactional Function-as-a-Service Framework

    Authors: Peter Kraft, Qian Li, Kostis Kaffes, Athinagoras Skiadopoulos, Deeptaanshu Kumar, Danny Cho, Jason Li, Robert Redmond, Nathan Weckwerth, Brian Xia, Peter Bailis, Michael Cafarella, Goetz Graefe, Jeremy Kepner, Christos Kozyrakis, Michael Stonebraker, Lalith Suresh, Xiangyao Yu, Matei Zaharia

    Abstract: Developers increasingly use function-as-a-service (FaaS) platforms for data-centric applications that perform low-latency and transactional operations on data, such as for microservices or web serving. Unfortunately, existing FaaS platforms support these applications poorly because they physically and logically separate application logic, executed in cloud functions, from data management, done in… ▽ More

    Submitted 30 June, 2023; v1 submitted 27 August, 2022; originally announced August 2022.

    Comments: 14 pages, 13 figures, 3 tables. Preprint

  22. arXiv:2207.07033  [pdf, other

    cs.AI cs.CY

    Developing a Series of AI Challenges for the United States Department of the Air Force

    Authors: Vijay Gadepally, Gregory Angelides, Andrei Barbu, Andrew Bowne, Laura J. Brattain, Tamara Broderick, Armando Cabrera, Glenn Carl, Ronisha Carter, Miriam Cha, Emilie Cowen, Jesse Cummings, Bill Freeman, James Glass, Sam Goldberg, Mark Hamilton, Thomas Heldt, Kuan Wei Huang, Phillip Isola, Boris Katz, Jamie Koerner, Yen-Chen Lin, David Mayo, Kyle McAlpin, Taylor Perron , et al. (17 additional authors not shown)

    Abstract: Through a series of federal initiatives and orders, the U.S. Government has been making a concerted effort to ensure American leadership in AI. These broad strategy documents have influenced organizations such as the United States Department of the Air Force (DAF). The DAF-MIT AI Accelerator is an initiative between the DAF and MIT to bridge the gap between AI researchers and DAF mission requireme… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

  23. The MIT Supercloud Workload Classification Challenge

    Authors: Benny J. Tang, Qiqi Chen, Matthew L. Weiss, Nathan Frey, Joseph McDonald, David Bestor, Charles Yee, William Arcand, Chansup Byun, Daniel Edelman, Matthew Hubbell, Michael Jones, Jeremy Kepner, Anna Klein, Adam Michaleas, Peter Michaleas, Lauren Milechin, Julia Mullen, Andrew Prout, Albert Reuther, Antonio Rosa, Andrew Bowne, Lindsey McEvoy, Baolin Li, Devesh Tiwari , et al. (2 additional authors not shown)

    Abstract: High-Performance Computing (HPC) centers and cloud providers support an increasingly diverse set of applications on heterogenous hardware. As Artificial Intelligence (AI) and Machine Learning (ML) workloads have become an increasingly larger share of the compute workloads, new approaches to optimized resource usage, allocation, and deployment of new AI frameworks are needed. By identifying compute… ▽ More

    Submitted 13 April, 2022; v1 submitted 12 April, 2022; originally announced April 2022.

    Comments: Accepted at IPDPS ADOPT'22

  24. arXiv:2203.13934  [pdf, other

    cs.NI cs.DC cs.OS cs.SI

    GraphBLAS on the Edge: Anonymized High Performance Streaming of Network Traffic

    Authors: Michael Jones, Jeremy Kepner, Daniel Andersen, Aydin Buluc, Chansup Byun, K Claffy, Timothy Davis, William Arcand, Jonathan Bernays, David Bestor, William Bergeron, Vijay Gadepally, Micheal Houle, Matthew Hubbell, Hayden Jananthan, Anna Klein, Chad Meiners, Lauren Milechin, Julie Mullen, Sandeep Pisharody, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Jon Sreekanth , et al. (3 additional authors not shown)

    Abstract: Long range detection is a cornerstone of defense in many operating domains (land, sea, undersea, air, space, ..,). In the cyber domain, long range detection requires the analysis of significant network traffic from a variety of observatories and outposts. Construction of anonymized hypersparse traffic matrices on edge network devices can be a key enabler by providing significant data compression i… ▽ More

    Submitted 5 September, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

    Comments: Accepted to IEEE HPEC, Outstanding Paper Award, 8 pages, 8 figures, 1 table, 70 references. arXiv admin note: text overlap with arXiv:2108.06653, arXiv:2008.00307, arXiv:2203.10230

  25. Temporal Correlation of Internet Observatories and Outposts

    Authors: Jeremy Kepner, Michael Jones, Daniel Andersen, Aydın Buluç, Chansup Byun, K Claffy, Timothy Davis, William Arcand, Jonathan Bernays, David Bestor, William Bergeron, Vijay Gadepally, Daniel Grant, Micheal Houle, Matthew Hubbell, Hayden Jananthan, Anna Klein, Chad Meiners, Lauren Milechin, Andrew Morris, Julie Mullen, Sandeep Pisharody, Andrew Prout, Albert Reuther, Antonio Rosa , et al. (4 additional authors not shown)

    Abstract: The Internet has become a critical component of modern civilization requiring scientific exploration akin to endeavors to understand the land, sea, air, and space environments. Understanding the baseline statistical distributions of traffic are essential to the scientific understanding of the Internet. Correlating data from different Internet observatories and outposts can be a useful tool for gai… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

    Comments: 8 pages, 8 figures, 2 tables, 59 references; accepted to GrAPL 2022. arXiv admin note: substantial text overlap with arXiv:2108.06653

  26. arXiv:2201.06096  [pdf, other

    cs.NI cs.CY cs.DC cs.SI

    New Phenomena in Large-Scale Internet Traffic

    Authors: Jeremy Kepner, Kenjiro Cho, KC Claffy, Vijay Gadepally, Sarah McGuire, Lauren Milechin, William Arcand, David Bestor, William Bergeron, Chansup Byun, Matthew Hubbell, Michael Houle, Michael Jones, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Charles Yee, Peter Michaleas

    Abstract: The Internet is transforming our society, necessitating a quantitative understanding of Internet traffic. Our team collects and curates the largest publicly available Internet traffic data sets. An analysis of 50 billion packets using 10,000 processors in the MIT SuperCloud reveals a new phenomenon: the importance of otherwise unseen leaf nodes and isolated links in Internet traffic. Our analysis… ▽ More

    Submitted 16 January, 2022; originally announced January 2022.

    Comments: 53 pages, 27 figures, 8 tables, 121 references. Portions of this work originally appeared as arXiv:1904.04396v1 which has been split for publication in the book "Massive Graph Analytics" (edited by David Bader)

  27. arXiv:2201.06068  [pdf

    cs.CR cs.CY cs.NI cs.SI

    Zero Botnets: An Observe-Pursue-Counter Approach

    Authors: Jeremy Kepner, Jonathan Bernays, Stephen Buckley, Kenjiro Cho, Cary Conrad, Leslie Daigle, Keeley Erhardt, Vijay Gadepally, Barry Greene, Michael Jones, Robert Knake, Bruce Maggs, Peter Michaleas, Chad Meiners, Andrew Morris, Alex Pentland, Sandeep Pisharody, Sarah Powazek, Andrew Prout, Philip Reiner, Koichi Suzuki, Kenji Takahashi, Tony Tauber, Leah Walker, Douglas Stetson

    Abstract: Adversarial Internet robots (botnets) represent a growing threat to the safe use and stability of the Internet. Botnets can play a role in launching adversary reconnaissance (scanning and phishing), influence operations (upvoting), and financing operations (ransomware, market manipulation, denial of service, spamming, and ad click fraud) while obfuscating tailored tactical operations. Reducing the… ▽ More

    Submitted 16 January, 2022; originally announced January 2022.

    Comments: 26 pages, 13 figures, 2 tables, 72 references, submitted to PlosOne

    Report number: Harvard Belfer Center Report (2021 June)

  28. arXiv:2110.01495  [pdf, other

    cs.CR

    Realizing Forward Defense in the Cyber Domain

    Authors: Sandeep Pisharody, Jonathan Bernays, Vijay Gadepally, Michael Jones, Jeremy Kepner, Chad Meiners, Peter Michaleas, Adam Tse, Doug Stetson

    Abstract: With the recognition of cyberspace as an operating domain, concerted effort is now being placed on addressing it in the whole-of-domain manner found in land, sea, undersea, air, and space domains. Among the first steps in this effort is applying the standard supporting concepts of security, defense, and deterrence to the cyber domain. This paper presents an architecture that helps realize forward… ▽ More

    Submitted 4 October, 2021; originally announced October 2021.

  29. arXiv:2109.10951  [pdf

    cs.NE cs.DB cs.DC q-bio.NC

    Naming Schema for a Human Brain-Scale Neural Network

    Authors: Morgan Schaefer, Lauren Michelin, Jeremy Kepner

    Abstract: Deep neural networks have become increasingly large and sparse, allowing for the storage of large-scale neural networks with decreased costs of storage and computation. Storage of a neural network with as many connections as the human brain is possible with current versions of the high-performance Apache Accumulo database and the Distributed Dimensional Data Model (D4M) software. Neural networks o… ▽ More

    Submitted 22 September, 2021; originally announced September 2021.

    Comments: 4 pages, 2 figures, t table, 23 references; accepted to IEEE MIT URTC 2021

  30. AI Accelerator Survey and Trends

    Authors: Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, Jeremy Kepner

    Abstract: Over the past several years, new machine learning accelerators were being announced and released every month for a variety of applications from speech recognition, video object detection, assisted driving, and many data center applications. This paper updates the survey of AI accelerators and processors from past two years. This paper collects and summarizes the current commercial accelerators tha… ▽ More

    Submitted 18 September, 2021; originally announced September 2021.

    Comments: 9 pages, 2 figures, IEEE High Performance Extreme Computing Conference 2021

    ACM Class: C.1.4; C.4

  31. 3D Real-Time Supercomputer Monitoring

    Authors: Bill Bergeron, Matthew Hubbell, Dylan Sequeira, Winter Williams, William Arcand, David Bestor, Chansup, Byun, Vijay Gadepally, Michael Houle, Michael Jones, Anna Klien, Peter Michaleas, Lauren Milechin, Julie Mullen Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Charles Yee, Jeremy Kepner

    Abstract: Supercomputers are complex systems producing vast quantities of performance data from multiple sources and of varying types. Performance data from each of the thousands of nodes in a supercomputer tracks multiple forms of storage, memory, networks, processors, and accelerators. Optimization of application performance is critical for cost effective usage of a supercomputer and requires efficient me… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

  32. arXiv:2108.11525  [pdf, other

    cs.DB cs.DC cs.GR cs.HC cs.MM

    Supercomputing Enabled Deployable Analytics for Disaster Response

    Authors: Kaira Samuel, Jeremy Kepner, Michael Jones, Lauren Milechin, Vijay Gadepally, William Arcand, David Bestor, William Bergeron, Chansup Byun, Matthew Hubbell, Michael Houle, Anna Klein, Victor Lopez, Julie Mullen, Andrew Prout, Albert Reuther, Antonio Rosa, Sid Samsi, Charles Yee, Peter Michaleas

    Abstract: First responders and other forward deployed essential workers can benefit from advanced analytics. Limited network access and software security requirements prevent the usage of standard cloud based microservice analytic platforms that are typically used in industry. One solution is to precompute a wide range of analytics as files that can be used with standard preinstalled software that does not… ▽ More

    Submitted 25 August, 2021; originally announced August 2021.

    Comments: 5 pages, 11 figures, 17 references, accepted to IEEE HPEC 2021

  33. arXiv:2108.11503  [pdf, other

    cs.AI cs.CV cs.DC cs.PF

    Maneuver Identification Challenge

    Authors: Kaira Samuel, Vijay Gadepally, David Jacobs, Michael Jones, Kyle McAlpin, Kyle Palko, Ben Paulk, Sid Samsi, Ho Chit Siu, Charles Yee, Jeremy Kepner

    Abstract: AI algorithms that identify maneuvers from trajectory data could play an important role in improving flight safety and pilot training. AI challenges allow diverse teams to work together to solve hard problems and are an effective tool for developing AI solutions. AI challenges are also a key driver of AI computational requirements. The Maneuver Identification Challenge hosted at maneuver-id.mit.ed… ▽ More

    Submitted 25 August, 2021; originally announced August 2021.

    Comments: 7 pages, 8 figures, 1 table, 33 references, accepted to IEEE HPEC 2021

  34. Node-Based Job Scheduling for Large Scale Simulations of Short Running Jobs

    Authors: Chansup Byun, William Arcand, David Bestor, Bill Bergeron, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Anna Klein, Peter Michaleas, Lauren Milechin, Julie Mullen, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Charles Yee, Jeremy Kepner

    Abstract: Diverse workloads such as interactive supercomputing, big data analysis, and large-scale AI algorithm development, requires a high-performance scheduler. This paper presents a novel node-based scheduling approach for large scale simulations of short running jobs on MIT SuperCloud systems, that allows the resources to be fully utilized for both long running batch jobs while simultaneously providing… ▽ More

    Submitted 25 August, 2021; originally announced August 2021.

    Comments: IEEE HPEC 2021

  35. arXiv:2108.06653  [pdf, other

    cs.NI cs.DC cs.PF cs.SI

    Spatial Temporal Analysis of 40,000,000,000,000 Internet Darkspace Packets

    Authors: Jeremy Kepner, Michael Jones, Daniel Andersen, Aydin Buluc, Chansup Byun, K Claffy, Timothy Davis, William Arcand, Jonathan Bernays, David Bestor, William Bergeron, Vijay Gadepally, Micheal Houle, Matthew Hubbell, Anna Klein, Chad Meiners, Lauren Milechin, Julie Mullen, Sandeep Pisharody, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Doug Stetson, Adam Tse , et al. (2 additional authors not shown)

    Abstract: The Internet has never been more important to our society, and understanding the behavior of the Internet is essential. The Center for Applied Internet Data Analysis (CAIDA) Telescope observes a continuous stream of packets from an unsolicited darkspace representing 1/256 of the Internet. During 2019 and 2020 over 40,000,000,000,000 unique packets were collected representing the largest ever assem… ▽ More

    Submitted 14 August, 2021; originally announced August 2021.

    Comments: 8 pages, 9 figures, 2 tables, 43 references, accepted to IEEE HPEC 2021. arXiv admin note: substantial text overlap with arXiv:2008.00307

  36. arXiv:2108.06650  [pdf, other

    cs.DC cs.DM cs.MS cs.NI cs.PF

    Vertical, Temporal, and Horizontal Scaling of Hierarchical Hypersparse GraphBLAS Matrices

    Authors: Jeremy Kepner, Tim Davis, Chansup Byun, William Arcand, David Bestor, William Bergeron, Vijay Gadepally, Matthew Hubbell, Michael Houle, Michael Jones, Anna Klein, Lauren Milechin, Julie Mullen, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Charles Yee, Peter Michaleas

    Abstract: Hypersparse matrices are a powerful enabler for a variety of network, health, finance, and social applications. Hierarchical hypersparse GraphBLAS matrices enable rapid streaming updates while preserving algebraic analytic power and convenience. In many contexts, the rate of these updates sets the bounds on performance. This paper explores hierarchical hypersparse update performance on a variety o… ▽ More

    Submitted 14 August, 2021; originally announced August 2021.

    Comments: 6 pages, 5 figures, 32 references, accepted to IEEE HPEC 2021. arXiv admin note: text overlap with arXiv:2001.06935

  37. arXiv:2108.02037  [pdf

    cs.DC cs.AI cs.LG

    The MIT Supercloud Dataset

    Authors: Siddharth Samsi, Matthew L Weiss, David Bestor, Baolin Li, Michael Jones, Albert Reuther, Daniel Edelman, William Arcand, Chansup Byun, John Holodnack, Matthew Hubbell, Jeremy Kepner, Anna Klein, Joseph McDonald, Adam Michaleas, Peter Michaleas, Lauren Milechin, Julia Mullen, Charles Yee, Benjamin Price, Andrew Prout, Antonio Rosa, Allan Vanterpool, Lindsey McEvoy, Anson Cheng , et al. (2 additional authors not shown)

    Abstract: Artificial intelligence (AI) and Machine learning (ML) workloads are an increasingly larger share of the compute workloads in traditional High-Performance Computing (HPC) centers and commercial cloud systems. This has led to changes in deployment approaches of HPC clusters and the commercial cloud, as well as a new focus on approaches to optimized resource usage, allocations and deployment of new… ▽ More

    Submitted 4 August, 2021; originally announced August 2021.

  38. Hybrid Power-Law Models of Network Traffic

    Authors: Pat Devlin, Jeremy Kepner, Ashley Luo, Erin Meger

    Abstract: The availability of large scale streaming network data has reinforced the ubiquity of power-law distributions in observations and enabled precision measurements of the distribution parameters. The increased accuracy of these measurements allows new underlying generative network models to be explored. The preferential attachment model is a natural starting point for these models. This work adds add… ▽ More

    Submitted 29 March, 2021; originally announced March 2021.

    Comments: 8 pages, 4 figures. arXiv admin note: text overlap with arXiv:1904.04396

  39. arXiv:2103.15203  [pdf, other

    cs.MS cs.DB cs.DM cs.NE math.RA

    Mathematics of Digital Hyperspace

    Authors: Jeremy Kepner, Timothy Davis, Vijay Gadepally, Hayden Jananthan, Lauren Milechin

    Abstract: Social media, e-commerce, streaming video, e-mail, cloud documents, web pages, traffic flows, and network packets fill vast digital lakes, rivers, and oceans that we each navigate daily. This digital hyperspace is an amorphous flow of data supported by continuous streams that stretch standard concepts of type and dimension. The unstructured data of digital hyperspace can be elegantly represented,… ▽ More

    Submitted 28 March, 2021; originally announced March 2021.

    Comments: 9 pages, 8 figures, 2 tables, accepted to GrAPL 2021. arXiv admin note: text overlap with arXiv:1807.03165, arXiv:2004.01181, arXiv:1909.05631, arXiv:1708.02937

  40. Survey of Machine Learning Accelerators

    Authors: Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, Jeremy Kepner

    Abstract: New machine learning accelerators are being announced and released each month for a variety of applications from speech recognition, video object detection, assisted driving, and many data center applications. This paper updates the survey of of AI accelerators and processors from last year's IEEE-HPEC paper. This paper collects and summarizes the current accelerators that have been publicly annou… ▽ More

    Submitted 31 August, 2020; originally announced September 2020.

    Comments: 12 pages, 2 figures, IEEE-HPEC conference, Waltham, MA, September 21-25, 2020. arXiv admin note: text overlap with arXiv:1908.11348

  41. Accuracy and Performance Comparison of Video Action Recognition Approaches

    Authors: Matthew Hutchinson, Siddharth Samsi, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Micheal Houle, Matthew Hubbell, Micheal Jones, Jeremy Kepner, Andrew Kirby, Peter Michaleas, Lauren Milechin, Julie Mullen, Andrew Prout, Antonio Rosa, Albert Reuther, Charles Yee, Vijay Gadepally

    Abstract: Over the past few years, there has been significant interest in video action recognition systems and models. However, direct comparison of accuracy and computational performance results remain clouded by differing training environments, hardware specifications, hyperparameters, pipelines, and inference methods. This article provides a direct comparison between fourteen off-the-shelf and state-of-t… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

    Comments: Accepted for publication at IEEE HPEC 2020

  42. Benchmarking network fabrics for data distributed training of deep neural networks

    Authors: Siddharth Samsi, Andrew Prout, Michael Jones, Andrew Kirby, Bill Arcand, Bill Bergeron, David Bestor, Chansup Byun, Vijay Gadepally, Michael Houle, Matthew Hubbell, Anna Klein, Peter Michaleas, Lauren Milechin, Julie Mullen, Antonio Rosa, Charles Yee, Albert Reuther, Jeremy Kepner

    Abstract: Artificial Intelligence/Machine Learning applications require the training of complex models on large amounts of labelled data. The large computational requirements for training deep models have necessitated the development of new methods for faster training. One such approach is the data parallel approach, where the training data is distributed across multiple compute nodes. This approach is simp… ▽ More

    Submitted 18 August, 2020; originally announced August 2020.

    Comments: Accepted for publication at IEEE HPEC 2020

  43. Best of Both Worlds: High Performance Interactive and Batch Launching

    Authors: Chansup Byun, Jeremy Kepner, William Arcand, David Bestor, Bill Bergeron, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Andrew Kirby, Anna Klein, Peter Michaleas, Lauren Milechin, Julie Mullen, Andrew Prout, Antonio Rosa, Siddharth Samsi, Charles Yee, Albert Reuther

    Abstract: Rapid launch of thousands of jobs is essential for effective interactive supercomputing, big data analysis, and AI algorithm development. Achieving thousands of launches per second has required hardware to be available to receive these jobs. This paper presents a novel preemptive approach to implement spot jobs on MIT SuperCloud systems allowing the resources to be fully utilized for both long run… ▽ More

    Submitted 5 August, 2020; originally announced August 2020.

  44. Multi-Temporal Analysis and Scaling Relations of 100,000,000,000 Network Packets

    Authors: Jeremy Kepner, Chad Meiners, Chansup Byun, Sarah McGuire, Timothy Davis, William Arcand, Jonathan Bernays, David Bestor, William Bergeron, Vijay Gadepally, Raul Harnasch, Matthew Hubbell, Micheal Houle, Micheal Jones, Andrew Kirby, Anna Klein, Lauren Milechin, Julie Mullen, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Doug Stetson, Adam Tse, Charles Yee , et al. (1 additional authors not shown)

    Abstract: Our society has never been more dependent on computer networks. Effective utilization of networks requires a detailed understanding of the normal background behaviors of network traffic. Large-scale measurements of networks are computationally challenging. Building on prior work in interactive supercomputing and GraphBLAS hypersparse hierarchical traffic matrices, we have developed an efficient me… ▽ More

    Submitted 1 August, 2020; originally announced August 2020.

    Comments: 6 pages, 6 figures,3 tables, 49 references, accepted to IEEE HPEC 2020

  45. arXiv:2007.11112  [pdf, other

    cs.OS cs.AR cs.DB cs.DC cs.NI

    DBOS: A Proposal for a Data-Centric Operating System

    Authors: Michael Cafarella, David DeWitt, Vijay Gadepally, Jeremy Kepner, Christos Kozyrakis, Tim Kraska, Michael Stonebraker, Matei Zaharia

    Abstract: Current operating systems are complex systems that were designed before today's computing environments. This makes it difficult for them to meet the scalability, heterogeneity, availability, and security challenges in current cloud and parallel computing environments. To address these problems, we propose a radically new OS design based on data-centric architecture: all operating system state shou… ▽ More

    Submitted 21 July, 2020; originally announced July 2020.

  46. arXiv:2007.07336  [pdf, other

    cs.LG cs.DC cs.PF stat.ML

    Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks via Nonlinear Multigrid

    Authors: Andrew C. Kirby, Siddharth Samsi, Michael Jones, Albert Reuther, Jeremy Kepner, Vijay Gadepally

    Abstract: A Multigrid Full Approximation Storage algorithm for solving Deep Residual Networks is developed to enable neural network parallelized layer-wise training and concurrent computational kernel execution on GPUs. This work demonstrates a 10.2x speedup over traditional layer-wise model parallelism techniques using the same number of compute units.

    Submitted 30 August, 2020; v1 submitted 14 July, 2020; originally announced July 2020.

    Comments: 7 pages, 6 figures, 27 citations. Accepted to 2020 IEEE High Performance Extreme Computing Conference - Outstanding Paper Award

  47. Fast Mapping onto Census Blocks

    Authors: Jeremy Kepner, Andreas Kipf, Darren Engwirda, Navin Vembar, Michael Jones, Lauren Milechin, Vijay Gadepally, Chris Hill, Tim Kraska, William Arcand, David Bestor, William Bergeron, Chansup Byun, Matthew Hubbell, Michael Houle, Andrew Kirby, Anna Klein, Julie Mullen, Andrew Prout, Albert Reuther, Antonio Rosa, Sid Samsi, Charles Yee, Peter Michaleas

    Abstract: Pandemic measures such as social distancing and contact tracing can be enhanced by rapidly integrating dynamic location data and demographic data. Projecting billions of longitude and latitude locations onto hundreds of thousands of highly irregular demographic census block polygons is computationally challenging in both research and deployment contexts. This paper describes two approaches labeled… ▽ More

    Submitted 1 August, 2020; v1 submitted 6 May, 2020; originally announced May 2020.

    Comments: 8 pages, 7 figures, 55 references; accepted to IEEE HPEC 2020

  48. arXiv:2004.01181  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    GraphChallenge.org Sparse Deep Neural Network Performance

    Authors: Jeremy Kepner, Simon Alford, Vijay Gadepally, Michael Jones, Lauren Milechin, Albert Reuther, Ryan Robinett, Sid Samsi

    Abstract: The MIT/IEEE/Amazon GraphChallenge.org encourages community approaches to developing new solutions for analyzing graphs and sparse data. Sparse AI analytics present unique scalability difficulties. The Sparse Deep Neural Network (DNN) Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a challenge that is reflective of emerging sp… ▽ More

    Submitted 5 April, 2020; v1 submitted 24 March, 2020; originally announced April 2020.

    Comments: 7 pages, 7 figures, 80 references, to be submitted to IEEE HPEC 2020. This work reports new updated results on prior work reported in arXiv:1909.05631. arXiv admin note: substantial text overlap with arXiv:1807.03165, arXiv:1708.02937. arXiv admin note: text overlap with arXiv:2003.09269

  49. arXiv:2004.00190  [pdf

    cs.DB

    Technical Report: Developing a Working Data Hub

    Authors: Vijay Gadepally, Jeremy Kepner

    Abstract: Data forms a key component of any enterprise. The need for high quality and easy access to data is further amplified by organizations wishing to leverage machine learning or artificial intelligence for their operations. To this end, many organizations are building resources for managing heterogenous data, providing end-users with an organization wide view of available data, and acting as a central… ▽ More

    Submitted 17 April, 2020; v1 submitted 31 March, 2020; originally announced April 2020.

    Comments: Fixes typographical errors; references updated; minor content updates in Section 4. arXiv admin note: substantial text overlap with arXiv:1905.03592

  50. GraphChallenge.org Triangle Counting Performance

    Authors: Siddharth Samsi, Jeremy Kepner, Vijay Gadepally, Michael Hurley, Michael Jones, Edward Kao, Sanjeev Mohindra, Albert Reuther, Steven Smith, William Song, Diane Staheli, Paul Monticciolo

    Abstract: The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems. GraphChallenge.org provides a wide range of pre-pa… ▽ More

    Submitted 18 March, 2020; originally announced March 2020.

    Comments: 10 pages, 8 figures, 121 references, to be submitted to IEEE HPEC 2020. This work reports new updated results on prior work reported in arXiv:1805.09675 & arXiv:1708.06866