Skip to main content

Showing 1–20 of 20 results for author: John, L

  1. arXiv:2405.00820  [pdf, other

    cs.AR cs.LG

    HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond

    Authors: Stefan Abi-Karam, Rishov Sarkar, Allison Seigler, Sean Lowe, Zhigang Wei, Hanqiu Chen, Nanditha Rao, Lizy John, Aman Arora, Cong Hao

    Abstract: Machine learning (ML) techniques have been applied to high-level synthesis (HLS) flows for quality-of-result (QoR) prediction and design space exploration (DSE). Nevertheless, the scarcity of accessible high-quality HLS datasets and the complexity of building such datasets present challenges. Existing datasets have limitations in terms of benchmark coverage, design space enumeration, vendor extens… ▽ More

    Submitted 17 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: Edit to "Section V.E" for proper attribution of open-source HLSyn, AutoDSE, and the Merlin compiler

  2. arXiv:2311.11384  [pdf, other

    cs.AR

    PIMSAB: A Processing-In-Memory System with Spatially-Aware Communication and Bit-Serial-Aware Computation

    Authors: Aman Arora, Jian Weng, Siyuan Ma, Tony Nowatzki, Lizy K. John

    Abstract: Bit-serial Processing-In-Memory (PIM) is an attractive paradigm for accelerator architectures, for parallel workloads such as Deep Learning (DL), because of its capability to achieve massive data parallelism at a low area overhead and provide orders-of-magnitude data movement savings by moving computational resources closer to the data. While many PIM architectures have been proposed, improvements… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

    Comments: Aman Arora and Jian Weng are co-first authors with equal contribution

  3. arXiv:2304.10618  [pdf, other

    cs.AR eess.SP

    ULEEN: A Novel Architecture for Ultra Low-Energy Edge Neural Networks

    Authors: Zachary Susskind, Aman Arora, Igor D. S. Miranda, Alan T. L. Bacellar, Luis A. Q. Villon, Rafael F. Katopodis, Leandro S. de Araujo, Diego L. C. Dutra, Priscila M. V. Lima, Felipe M. G. Franca, Mauricio Breternitz Jr., Lizy K. John

    Abstract: The deployment of AI models on low-power, real-time edge devices requires accelerators for which energy, latency, and area are all first-order concerns. There are many approaches to enabling deep neural networks (DNNs) in this domain, including pruning, quantization, compression, and binary neural networks (BNNs), but with the emergence of the "extreme edge", there is now a demand for even more ef… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Comments: 14 pages, 14 figures Portions of this article draw heavily from arXiv:2203.01479, most notably sections 5E and 5F.2

  4. arXiv:2302.10977  [pdf, other

    cs.AR cs.LG

    HLSDataset: Open-Source Dataset for ML-Assisted FPGA Design using High Level Synthesis

    Authors: Zhigang Wei, Aman Arora, Ruihao Li, Lizy K. John

    Abstract: Machine Learning (ML) has been widely adopted in design exploration using high level synthesis (HLS) to give a better and faster performance, and resource and power estimation at very early stages for FPGA-based design. To perform prediction accurately, high-quality and large-volume datasets are required for training ML models.This paper presents a dataset for ML-assisted FPGA design using HLS, ca… ▽ More

    Submitted 21 August, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

    Comments: 8 pages, 5 figures

  5. arXiv:2203.12521  [pdf, other

    cs.AR

    CoMeFa: Compute-in-Memory Blocks for FPGAs

    Authors: Aman Arora, Tanmay Anand, Aatman Borda, Rishabh Sehgal, Bagus Hanindhito, Jaydeep Kulkarni, Lizy K. John

    Abstract: Block RAMs (BRAMs) are the storage houses of FPGAs, providing extensive on-chip memory bandwidth to the compute units implemented using Logic Blocks (LBs) and Digital Signal Processing (DSP) slices. We propose modifying BRAMs to convert them to CoMeFa (Compute-In-Memory Blocks for FPGAs) RAMs. These RAMs provide highly-parallel compute-in-memory by combining computation and storage capabilities in… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

    Comments: 10 pages, 12 figures, 4 tables, FCCM conference

  6. arXiv:2203.01479  [pdf, other

    cs.AR cs.LG

    Weightless Neural Networks for Efficient Edge Inference

    Authors: Zachary Susskind, Aman Arora, Igor Dantas Dos Santos Miranda, Luis Armando Quintanilla Villon, Rafael Fontella Katopodis, Leandro Santiago de Araujo, Diego Leonel Cadette Dutra, Priscila Machado Vieira Lima, Felipe Maia Galvao Franca, Mauricio Breternitz Jr., Lizy K. John

    Abstract: Weightless Neural Networks (WNNs) are a class of machine learning model which use table lookups to perform inference. This is in contrast with Deep Neural Networks (DNNs), which use multiply-accumulate operations. State-of-the-art WNN architectures have a fraction of the implementation cost of DNNs, but still lag behind them on accuracy for common image recognition tasks. Additionally, many existi… ▽ More

    Submitted 2 March, 2022; originally announced March 2022.

  7. arXiv:2109.06133  [pdf, other

    cs.AI cs.LG cs.NE cs.PF

    Neuro-Symbolic AI: An Emerging Class of AI Workloads and their Characterization

    Authors: Zachary Susskind, Bryce Arden, Lizy K. John, Patrick Stockton, Eugene B. John

    Abstract: Neuro-symbolic artificial intelligence is a novel area of AI research which seeks to combine traditional rules-based AI approaches with modern deep learning techniques. Neuro-symbolic models have already demonstrated the capability to outperform state-of-the-art deep learning models in domains such as image and video reasoning. They have also been shown to obtain high accuracy with significantly l… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

    Comments: 11 pages, 7 figures

    ACM Class: C.4; I.2.m

  8. arXiv:2107.09178  [pdf, other

    cs.AR eess.SP

    Compute RAMs: Adaptable Compute and Storage Blocks for DL-Optimized FPGAs

    Authors: Aman Arora, Bagus Hanindhito, Lizy K. John

    Abstract: The configurable building blocks of current FPGAs -- Logic blocks (LBs), Digital Signal Processing (DSP) slices, and Block RAMs (BRAMs) -- make them efficient hardware accelerators for the rapid-changing world of Deep Learning (DL). Communication between these blocks happens through an interconnect fabric consisting of switching elements spread throughout the FPGA. In this paper, a new block, Comp… ▽ More

    Submitted 30 September, 2021; v1 submitted 19 July, 2021; originally announced July 2021.

    Comments: 8 pages, IEEE Signal Processing Society's ASILOMAR Conference on Signals, Systems and Computers

  9. arXiv:2106.07087  [pdf, other

    cs.AR

    Koios: A Deep Learning Benchmark Suite for FPGA Architecture and CAD Research

    Authors: Aman Arora, Andrew Boutros, Daniel Rauch, Aishwarya Rajen, Aatman Borda, Seyed Alireza Damghani, Samidh Mehta, Sangram Kate, Pragnesh Patel, Kenneth B. Kent, Vaughn Betz, Lizy K. John

    Abstract: With the prevalence of deep learning (DL) in many applications, researchers are investigating different ways of optimizing FPGA architecture and CAD to achieve better quality-of-results (QoR) on DL-based workloads. In this optimization process, benchmark circuits are an essential component; the QoR achieved on a set of benchmarks is the main driver for architecture and CAD design choices. However,… ▽ More

    Submitted 13 June, 2021; originally announced June 2021.

  10. arXiv:2012.05181  [pdf, ps, other

    cs.AR

    Virtual-Link: A Scalable Multi-Producer, Multi-Consumer Message Queue Architecture for Cross-Core Communication

    Authors: Qinzhe Wu, Jonathan Beard, Ashen Ekanayake, Andreas Gerstlauer, Lizy K. John

    Abstract: Cross-core communication is increasingly a bottleneck as the number of processing elements increase per system-on-chip. Typical hardware solutions to cross-core communication are often inflexible; while software solutions are flexible, they have performance scaling limitations. A key problem, as we will show, is that of shared state in software-based message queue mechanisms. This paper proposes V… ▽ More

    Submitted 19 January, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

  11. arXiv:2008.07361  [pdf

    stat.AP cs.LG stat.ME stat.ML

    How little data do we need for patient-level prediction?

    Authors: Luis H. John, Jan A. Kors, Jenna M. Reps, Patrick B. Ryan, Peter R. Rijnbeek

    Abstract: Objective: Provide guidance on sample size considerations for developing predictive models by empirically establishing the adequate sample size, which balances the competing objectives of improving model performance and reducing model complexity as well as computational requirements. Materials and Methods: We empirically assess the effect of sample size on prediction performance and model comple… ▽ More

    Submitted 14 August, 2020; originally announced August 2020.

  12. arXiv:1908.09207  [pdf, ps, other

    cs.LG stat.ML

    Demystifying the MLPerf Benchmark Suite

    Authors: Snehil Verma, Qinzhe Wu, Bagus Hanindhito, Gunjan Jha, Eugene B. John, Ramesh Radhakrishnan, Lizy K. John

    Abstract: MLPerf, an emerging machine learning benchmark suite strives to cover a broad range of applications of machine learning. We present a study on its characteristics and how the MLPerf benchmarks differ from some of the previous deep learning benchmarks like DAWNBench and DeepBench. We find that application benchmarks such as MLPerf (although rich in kernels) exhibit different features compared to ke… ▽ More

    Submitted 24 August, 2019; originally announced August 2019.

  13. arXiv:1805.12305  [pdf, other

    cs.DC

    Start Late or Finish Early: A Distributed Graph Processing System with Redundancy Reduction

    Authors: Shuang Song, Xu Liu, Qinzhe Wu, Andreas Gerstlauer, Tao Li, Lizy K. John

    Abstract: Graph processing systems are important in the big data domain. However, processing graphs in parallel often introduces redundant computations in existing algorithms and models. Prior work has proposed techniques to optimize redundancies for the out-of-core graph systems, rather than the distributed graph systems. In this paper, we study various state-of-the-art distributed graph systems and observ… ▽ More

    Submitted 30 May, 2018; originally announced May 2018.

    Comments: 11 pages, 10 figures

  14. arXiv:1511.02134  [pdf, other

    cs.CE cs.MS math.NA

    A quantitative performance analysis for Stokes solvers at the extreme scale

    Authors: Björn Gmeiner, Markus Huber, Lorenz John, Ulrich Rüde, Barbara Wohlmuth

    Abstract: This article presents a systematic quantitative performance analysis for large finite element computations on extreme scale computing systems. Three parallel iterative solvers for the Stokes system, discretized by low order tetrahedral elements, are compared with respect to their numerical efficiency and their scalability running on up to $786\,432$ parallel threads. A genuine multigrid method for… ▽ More

    Submitted 6 November, 2015; originally announced November 2015.

    MSC Class: 65N55; 65Y05; 68Q25

  15. arXiv:1504.02205  [pdf, other

    cs.DC

    BigDataBench-MT: A Benchmark Tool for Generating Realistic Mixed Data Center Workloads

    Authors: Rui Han, Shulin Zhan, Chenrong Shao, Junwei Wang, Lizy K. John, Jiangtao Xu, Gang Lu, Lei Wang

    Abstract: Long-running service workloads (e.g. web search engine) and short-term data analysis workloads (e.g. Hadoop MapReduce jobs) co-locate in today's data centers. Developing realistic benchmarks to reflect such practical scenario of mixed workload is a key problem to produce trustworthy results when evaluating and comparing data center systems. This requires using actual workloads as well as guarantee… ▽ More

    Submitted 4 December, 2015; v1 submitted 9 April, 2015; originally announced April 2015.

    Comments: 12 pages, 5 figures

  16. arXiv:1306.1572  [pdf, other

    cs.CG math.CO

    Algorithms for detecting dependencies and rigid subsystems for CAD

    Authors: James Farre, Helena Kleinschmidt, Jessica Sidman, Audrey Lee-St. John, Stephanie Stark, Louis Theran, Xilin Yu

    Abstract: Geometric constraint systems underly popular Computer Aided Design soft- ware. Automated approaches for detecting dependencies in a design are critical for developing robust solvers and providing informative user feedback, and we provide algorithms for two types of dependencies. First, we give a pebble game algorithm for detecting generic dependencies. Then, we focus on identifying the "special po… ▽ More

    Submitted 1 October, 2015; v1 submitted 6 June, 2013; originally announced June 2013.

    Comments: 37 pages, 14 figures (v2 is an expanded version of an AGD'14 abstract based on v1)

  17. arXiv:1210.0451  [pdf, other

    cs.DM math.CO

    Combinatorics and the Rigidity of CAD Systems

    Authors: Audrey Lee-St. John, Jessica Sidman

    Abstract: We study the rigidity of body-and-cad frameworks which capture the majority of the geometric constraints used in 3D mechanical engineering CAD software. We present a combinatorial characterization of the generic minimal rigidity of a subset of body-and-cad frameworks in which we treat 20 of the 21 body-and-cad constraints, omitting only point-point coincidences. While the handful of classical comb… ▽ More

    Submitted 17 October, 2012; v1 submitted 1 October, 2012; originally announced October 2012.

    Comments: 17 pages, 7 figures, version to appear in Symposium on Solid and Physical Modeling '12 and associated special issue of Computer Aided Design

    MSC Class: 68R10; 05C50 ACM Class: G.2.1; J.6; I.3.5

  18. Single-trial EEG Discrimination between Wrist and Finger Movement Imagery and Execution in a Sensorimotor BCI

    Authors: A. K. Mohamed, T. Marwala, L. R. John

    Abstract: A brain-computer interface (BCI) may be used to control a prosthetic or orthotic hand using neural activity from the brain. The core of this sensorimotor BCI lies in the interpretation of the neural information extracted from electroencephalogram (EEG). It is desired to improve on the interpretation of EEG to allow people with neuromuscular disorders to perform daily activities. This paper investi… ▽ More

    Submitted 26 August, 2011; originally announced August 2011.

    Comments: 33rd Annual International IEEE EMBS Conference 2011

  19. arXiv:1006.1126  [pdf, other

    cs.CG

    Body-and-cad Geometric Constraint Systems

    Authors: Kirk Haller, Audrey Lee-St. John, Meera Sitharam, Ileana Streinu, Neil White

    Abstract: Motivated by constraint-based CAD software, we develop the foundation for the rigidity theory of a very general model: the body-and-cad structure, composed of rigid bodies in 3D constrained by pairwise coincidence, angular and distance constraints. We identify 21 relevant geometric constraints and develop the corresponding infinitesimal rigidity theory for these structures. The classical body-and-… ▽ More

    Submitted 6 June, 2010; originally announced June 2010.

    Comments: 33 pages, to appear in Computational Geometry: Theory and Applications (an abbreviated version appeared in: 24th Annual ACM Symposium on Applied Computing, Technical Track on Geometric Constraints and Reasoning GCR'09, Honolulu, HI, 2009)

    MSC Class: 68R10; 05C85; 05C50 ACM Class: I.3.5; J.6; G.2.1

  20. Archer: A Community Distributed Computing Infrastructure for Computer Architecture Research and Education

    Authors: Renato Figueiredo, P. Oscar Boykin, Jose A. B. Fortes, Tao Li, Jie-Kwon Peir, David Wolinsky, Lizy John, David Kaeli, David Lilja, Sally McKee, Gokhan Memik, Alain Roy, Gary Tyson

    Abstract: This paper introduces Archer, a community-based computing resource for computer architecture research and education. The Archer infrastructure integrates virtualization and batch scheduling middleware to deliver high-throughput computing resources aggregated from resources distributed across wide-area networks and owned by different participating entities in a seamless manner. The paper discusse… ▽ More

    Submitted 10 July, 2008; originally announced July 2008.

    Comments: 11 pages, 2 figures. Describes the Archer project, http://archer-project.org

    ACM Class: C.0; I.6.3; C.2.4