skip to main content
research-article

GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption

Published: 08 December 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Fully Homomorphic Encryption (FHE) enables the processing of encrypted data without decrypting it. FHE has garnered significant attention over the past decade as it supports secure outsourcing of data processing to remote cloud services. Despite its promise of strong data privacy and security guarantees, FHE introduces a slowdown of up to five orders of magnitude as compared to the same computation using plaintext data. This overhead is presently a major barrier to the commercial adoption of FHE.
    In this work, we leverage GPUs to accelerate FHE, capitalizing on a well-established GPU ecosystem available in the cloud. We propose GME, which combines three key microarchitectural extensions along with a compile-time optimization to the current AMD CDNA GPU architecture. First, GME integrates a lightweight on-chip compute unit (CU)-side hierarchical interconnect to retain ciphertext in cache across FHE kernels, thus eliminating redundant memory transactions. Second, to tackle compute bottlenecks, GME introduces special MOD-units that provide native custom hardware support for modular reduction operations, one of the most commonly executed sets of operations in FHE. Third, by integrating the MOD-unit with our novel pipelined 64-bit integer arithmetic cores (WMAC-units), GME further accelerates FHE workloads by . Finally, we propose a Locality-Aware Block Scheduler (LABS) that exploits the temporal locality available in FHE primitive blocks. Incorporating these microarchitectural features and compiler optimizations, we create a synergistic approach achieving average speedups of 796 ×, 14.2 ×, and 2.3 × over Intel Xeon CPU, NVIDIA V100 GPU, and Xilinx FPGA implementations, respectively.

    References

    [1]
    Rashmi Agrawal, Leo de Castro, Guowei Yang, Chiraag Juvekar, Rabia Yazicigil, Anantha Chandrakasan, Vinod Vaikuntanathan, and Ajay Joshi. 2023. FAB: An FPGA-based accelerator for bootstrappable fully homomorphic encryption. In 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 882–895. https://doi.org/10.1109/HPCA56546.2023.10070953
    [2]
    Rashmi Agrawal and Ajay Joshi. 2023. On Architecting Fully Homomorphic Encryption-based Computing Systems. https://doi.org/10.1007/978-3-031-31754-5
    [3]
    Ahmad Al Badawi, Louie Hoang, Chan Fook Mun, Kim Laine, and Khin Mi Mi Aung. 2020. Privft: Private and fast text classification with homomorphic encryption. IEEE Access 8 (2020), 226544–226556.
    [4]
    Ahmad Al Badawi, Bharadwaj Veeravalli, Jie Lin, Nan Xiao, Matsumura Kazuaki, and Aung Khin Mi Mi. 2020. Multi-GPU design and performance evaluation of homomorphic encryption on GPU clusters. IEEE Transactions on Parallel and Distributed Systems 32, 2 (2020), 379–391.
    [5]
    Ahmad Al Badawi, Bharadwaj Veeravalli, Chan Fook Mun, and Khin Mi Mi Aung. 2018. High-performance FV somewhat homomorphic encryption on GPUs: An implementation using CUDA. IACR Transactions on Cryptographic Hardware and Embedded Systems (2018), 70–95.
    [6]
    AMD 2020. AMD Instinct MI100 Instruction Set Architecture. AMD. https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/instinct-mi100-cdna1-shader-instruction-set-architecture.pdfReference Guide.
    [7]
    AMD Inc.2020. "AMD Instinct MI100" Instruction Set Architecture, Reference Guide. https://developer.amd.com/wp-content/resources/CDNA1_Shader_ISA_14December2020.pdf
    [8]
    AMD Inc.2020. Introducing CDNA Architecture, The All-New AMD GPU Architecture for the Modern Era of HPC & AI. https://www.amd.com/system/files/documents/amd-cdna-whitepaper.pdf
    [9]
    AMD Inc.2022. HIP Programming Guide. https://rocmdocs.amd.com/en/latest/Programming_Guides/HIP-GUIDE.html
    [10]
    James Balfour and William J Dally. 2006. Design tradeoffs for tiled CMP on-chip networks. In ACM International conference on supercomputing 25th anniversary volume. 390–401. https://doi.org/10.1145/2591635.2667187
    [11]
    Yuhui Bao, Yifan Sun, Zlatan Feric, Michael Tian Shen, Micah Weston, José L Abellán, Trinayan Baruah, John Kim, Ajay Joshi, and David Kaeli. 2022. NaviSim: A Highly Accurate GPU Simulator for AMD RDNA GPUs. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 333–345. https://doi.org/10.1145/3559009.3569666
    [12]
    Trinayan Baruah, Kaustubh Shivdikar, Shi Dong, Yifan Sun, Saiful A Mojumder, Kihoon Jung, José L Abellán, Yash Ukidave, Ajay Joshi, John Kim, 2021. Gnnmark: A benchmark suite to characterize graph neural network training on gpus. In 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 13–23. https://doi.org/10.1109/ISPASS51385.2021.00013
    [13]
    Flavio Bergamaschi. [n. d.]. HELib. https://github.com/homenc/HElib
    [14]
    Dusan Bikov and Iliya Bouyukliev. 2018. Parallel fast Walsh transform algorithm and its implementation with CUDA on GPUs. Cybernetics and Information Technologies 18, 5 (2018), 21–43. https://eprints.ugd.edu.mk/id/eprint/20026
    [15]
    Fabian Boemer, Sejun Kim, Gelila Seifu, Fillipe DM de Souza, and Vinodh Gopal. 2021. Intel HEXL: accelerating homomorphic encryption with Intel AVX512-IFMA52. In Proceedings of the 9th on Workshop on Encrypted Computing & Applied Homomorphic Cryptography. 57–62. https://doi.org/10.1145/3474366.3486926
    [16]
    Jean-Philippe Bossuat, Christian Mouchet, Juan Troncoso-Pastoriza, and Jean-Pierre Hubaux. 2021. Efficient bootstrapping for approximate homomorphic encryption with non-sparse keys. In Advances in Cryptology–EUROCRYPT 2021: 40th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Zagreb, Croatia, October 17–21, 2021, Proceedings, Part I. Springer, 587–617.
    [17]
    C Bunn, Harrison Barclay, A Lazarev, F Yusuf, J Fitch, J Booth, Kaustubh Shivdikar, and D Kaeli. 2019. Student cluster competition 2018, team northeastern university: Reproducing performance of a multi-physics simulations of the Tsunamigenic 2004 Sumatra Megathrust earthquake on the AMD EPYC 7551 architecture. Parallel Comput. 90 (2019), 102568. https://doi.org/10.1016/j.parco.2019.102568
    [18]
    Federico Busato and Nicola Bombieri. 2014. BFS-4K: an efficient implementation of BFS for kepler GPU architectures. IEEE Transactions on Parallel and Distributed Systems 26, 7 (2014), 1826–1838. https://doi.org/10.1109/TPDS.2014.2330597
    [19]
    Jung Hee Cheon, Kyoohyung Han, and Duhyeong Kim. 2020. Faster Bootstrapping of FHE over the Integers. In Information Security and Cryptology–ICISC 2019: 22nd International Conference, Seoul, South Korea, December 4–6, 2019, Revised Selected Papers. Springer, 242–259. https://doi.org/10.1007/978-3-030-40921-0_15
    [20]
    Jung Hee Cheon, Andrey Kim, Miran Kim, and Yongsoo Song. 2017. Homomorphic encryption for arithmetic of approximate numbers. In Advances in Cryptology–ASIACRYPT 2017: 23rd International Conference on the Theory and Applications of Cryptology and Information Security, Hong Kong, China, December 3-7, 2017, Proceedings, Part I 23. Springer, 409–437.
    [21]
    Young-kyu Choi, Yuze Chi, Jie Wang, Licheng Guo, and Jason Cong. 2020. When hls meets fpga hbm: Benchmarking and bandwidth optimization. arXiv preprint arXiv:2010.06075 (2020). https://doi.org/10.48550/arXiv.2010.06075
    [22]
    Lawrence T Clark, Vinay Vashishtha, Lucian Shifren, Aditya Gujja, Saurabh Sinha, Brian Cline, Chandarasekaran Ramamurthy, and Greg Yeric. 2016. ASAP7: A 7-nm finFET predictive process design kit. Microelectronics Journal 53 (2016), 105–115. https://doi.org/10.1016/j.mejo.2016.04.006
    [23]
    Salvatore Cuomo, Vincenzo De Angelis, Gennaro Farina, Livia Marcellino, and Gerardo Toraldo. 2019. A GPU-accelerated parallel K-means algorithm. Computers & Electrical Engineering 75 (2019), 262–274. https://doi.org/10.1016/j.compeleceng.2017.12.002
    [24]
    Leo de Castro, Rashmi Agrawal, Rabia Yazicigil, Anantha Chandrakasan, Vinod Vaikuntanathan, Chiraag Juvekar, and Ajay Joshi. 2021. Does fully homomorphic encryption need compute acceleration?arXiv preprint arXiv:2112.06396 (2021). https://doi.org/10.48550/arXiv.2112.06396
    [25]
    Xinqiang Ding, Yujin Wu, Yanming Wang, Jonah Z Vilseck, and Charles L Brooks III. 2020. Accelerated CDOCKER with GPUs, parallel simulated annealing, and fast Fourier transforms. Journal of chemical theory and computation 16, 6 (2020), 3910–3919. https://doi.org/10.1021/acs.jctc.0c00145
    [26]
    Anne C Elster and Tor A Haugdahl. 2022. Nvidia hopper gpu and grace cpu highlights. Computing in Science & Engineering 24, 2 (2022), 95–100. https://doi.org/10.1109/MCSE.2022.3163817
    [27]
    Shengyu Fan, Zhiwei Wang, Weizhi Xu, Rui Hou, Dan Meng, and Mingzhe Zhang. 2023. Tensorfhe: Achieving practical computation on encrypted data using gpgpu. In 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 922–934. https://doi.org/10.1109/HPCA56546.2023.10071017
    [28]
    Zhuo Feng, Zhiyu Zeng, and Peng Li. 2010. Parallel on-chip power distribution network analysis on multi-core-multi-GPU platforms. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 19, 10 (2010), 1823–1836. https://doi.org/10.1109/TVLSI.2010.2059718
    [29]
    Robin Geelen, Michiel Van Beirendonck, Hilder VL Pereira, Brian Huffman, Tynan McAuley, Ben Selfridge, Daniel Wagner, Georgios Dimou, Ingrid Verbauwhede, Frederik Vercauteren, 2022. BASALISC: Flexible asynchronous hardware accelerator for fully homomorphic encryption. arXiv preprint arXiv:2205.14017 (2022). https://doi.org/10.48550/arXiv.2205.14017
    [30]
    Craig Gentry. 2009. Fully homomorphic encryption using ideal lattices. In Proceedings of the forty-first annual ACM symposium on Theory of computing. 169–178.
    [31]
    Craig Gentry and Shai Halevi. 2011. Implementing gentry’s fully-homomorphic encryption scheme. In Advances in Cryptology–EUROCRYPT 2011: 30th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Tallinn, Estonia, May 15–19, 2011. Proceedings 30. Springer, 129–148.
    [32]
    Scott Grauer-Gray, William Killian, Robert Searles, and John Cavazos. 2013. Accelerating financial applications on the GPU. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units. 127–136. https://doi.org/10.1145/2458523.2458536
    [33]
    Saransh Gupta, Rosario Cammarota, and Tajana Šimunić Rosing. 2022. Memfhe: End-to-end computing with fully homomorphic encryption in memory. ACM Transactions on Embedded Computing Systems (2022). https://doi.org/10.1145/3569955
    [34]
    Shai Halevi and Victor Shoup. 2014. Algorithms in helib. In Advances in Cryptology–CRYPTO 2014: 34th Annual Cryptology Conference, Santa Barbara, CA, USA, August 17-21, 2014, Proceedings, Part I 34. Springer, 554–571.
    [35]
    Kyoohyung Han, Seungwan Hong, Jung Hee Cheon, and Daejun Park. 2019. Logistic regression on homomorphic encrypted data at scale. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 9466–9471.
    [36]
    Keisuke Iwai, Takakazu Kurokawa, and Naoki Nisikawa. 2010. AES encryption implementation on CUDA GPU and its analysis. In 2010 First International Conference on Networking and Computing. IEEE, 209–214. https://doi.org/10.1109/IC-NC.2010.49
    [37]
    Mojan Javaheripi, Gustavo de Rosa, Subhabrata Mukherjee, Shital Shah, Tomasz Religa, Caio Cesar Teodoro Mendes, Sebastien Bubeck, Farinaz Koushanfar, and Debadeepta Dey. 2022. LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models. Advances in Neural Information Processing Systems 35 (2022), 24254–24267. https://proceedings.neurips.cc/paper_files/paper/2022/hash/9949e6906be6448230cdba9a4cb2d564-Abstract-Conference.html
    [38]
    Malith Jayaweera, Kaustubh Shivdikar, Yanzhi Wang, and David Kaeli. 2021. JAXED: Reverse Engineering DNN Architectures Leveraging JIT GEMM Libraries. In 2021 International Symposium on Secure and Private Execution Environment Design (SEED). IEEE, 189–202. https://doi.org/10.1109/SEED51797.2021.00030
    [39]
    Natalie Enright Jerger, Tushar Krishna, and Li-Shiuan Peh. 2017. On-chip networks. Synthesis Lectures on Computer Architecture 12, 3 (2017), 1–210. https://picture.iczhiku.com/resource/eetop/SYieGarAzskjOvnm.pdf
    [40]
    Cao Jianli, Chen Zhikui, Wang Yuxin, and Guo He. 2020. Parallel genetic algorithm for N-Queens problem based on message passing interface-compute unified device architecture. Computational Intelligence 36, 4 (2020), 1621–1637. https://doi.org/10.1111/coin.12300
    [41]
    Wonkyung Jung, Sangpyo Kim, Jung Ho Ahn, Jung Hee Cheon, and Younho Lee. 2021. Over 100x faster bootstrapping in fully homomorphic encryption through memory-centric optimization with GPUs. IACR Transactions on Cryptographic Hardware and Embedded Systems (2021), 114–148.
    [42]
    Wonkyung Jung, Eojin Lee, Sangpyo Kim, Jongmin Kim, Namhoon Kim, Keewoo Lee, Chohong Min, Jung Hee Cheon, and Jung Ho Ahn. 2021. Accelerating fully homomorphic encryption through architecture-centric analysis and optimization. IEEE Access 9 (2021), 98772–98789. https://doi.org/10.1109/ACCESS.2021.3096189
    [43]
    David R Kaeli, Perhaad Mistry, Dana Schaa, and Dong Ping Zhang. 2015. Heterogeneous computing with OpenCL 2.0. Morgan Kaufmann, Burlington,MA,USA. https://dahlan.unimal.ac.id/files/ebooks2/2015%203rd%20Heterogeneous%20Computing%20with%20OpenCL%202.0.pdf
    [44]
    Jongmin Kim, Gwangho Lee, Sangpyo Kim, Gina Sohn, Minsoo Rhu, John Kim, and Jung Ho Ahn. 2022. Ark: Fully homomorphic encryption accelerator with runtime data generation and inter-operation key reuse. In 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1237–1254.
    [45]
    Sangpyo Kim, Jongmin Kim, Michael Jaemin Kim, Wonkyung Jung, John Kim, Minsoo Rhu, and Jung Ho Ahn. 2022. BTS: An accelerator for bootstrappable fully homomorphic encryption. In Proceedings of the 49th Annual International Symposium on Computer Architecture. 711–725. https://doi.org/10.1145/3470496.3527415
    [46]
    Sunwoong Kim, Keewoo Lee, Wonhee Cho, Jung Hee Cheon, and Rob A Rutenbar. 2019. FPGA-based accelerators of fully pipelined modular multipliers for homomorphic encryption. In 2019 International Conference on ReConFigurable Computing and FPGAs (ReConFig). IEEE, 1–8.
    [47]
    Sunwoong Kim, Keewoo Lee, Wonhee Cho, Yujin Nam, Jung Hee Cheon, and Rob A Rutenbar. 2020. Hardware architecture of a number theoretic transform for a bootstrappable RNS-based homomorphic encryption scheme. In 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 56–64.
    [48]
    Miroslav Knezevic, Frederik Vercauteren, and Ingrid Verbauwhede. 2010. Faster interleaved modular multiplication based on Barrett and Montgomery reduction methods. IEEE Trans. Comput. 59, 12 (2010), 1715–1721. https://doi.org/10.1109/TC.2010.93
    [49]
    Deguang Le, Jinyi Chang, Xingdou Gou, Ankang Zhang, and Conglan Lu. 2010. Parallel AES algorithm for fast data encryption on GPU. In 2010 2nd international conference on computer engineering and technology, Vol. 6. IEEE, V6–1. https://doi.org/10.1109/ICCET.2010.5486259
    [50]
    Eunsang Lee, Joon-Woo Lee, Junghyun Lee, Young-Sik Kim, Yongjune Kim, Jong-Seon No, and Woosuk Choi. 2022. Low-complexity deep convolutional neural networks on fully homomorphic encryption using multiplexed parallel convolutions. In International Conference on Machine Learning. PMLR, 12403–12422.
    [51]
    Minseok Lee, Seokwoo Song, Joosik Moon, John Kim, Woong Seo, Yeongon Cho, and Soojung Ryu. 2014. Improving GPGPU resource utilization through alternative thread block scheduling. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). 260–271. https://doi.org/10.1109/HPCA.2014.6835937
    [52]
    Victor W Lee, Changkyu Kim, Jatin Chhugani, Michael Deisher, Daehyun Kim, Anthony D Nguyen, Nadathur Satish, Mikhail Smelyanskiy, Srinivas Chennupaty, Per Hammarlund, 2010. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In Proceedings of the 37th annual international symposium on Computer architecture. 451–460. https://doi.org/10.1145/1815961.1816021
    [53]
    Xiaqing Li, Guangyan Zhang, H Howie Huang, Zhufan Wang, and Weimin Zheng. 2016. Performance analysis of GPU-based convolutional neural networks. In 2016 45th International conference on parallel processing (ICPP). IEEE, 67–76. https://doi.org/10.1109/ICPP.2016.15
    [54]
    Neal Livesay, Gilbert Jonatan, Evelio Mora, Kaustubh Shivdikar, Rashmi Agrawal, Ajay Joshi, José L Abellán, John Kim, and David Kaeli. 2023. Accelerating finite field arithmetic for homomorphic encryption on GPUs. 2023 IEEE MICRO (2023). https://doi.org/10.1109/MM.2023.3253052
    [55]
    Souhail Meftah, Benjamin Hong Meng Tan, Khin Mi Mi Aung, Lu Yuxiao, Lin Jie, and Bharadwaj Veeravalli. 2022. Towards high performance homomorphic encryption for inference tasks on CPU: An MPI approach. Future Generation Computer Systems 134 (2022), 13–21.
    [56]
    Duane Merrill, Michael Garland, and Andrew Grimshaw. 2012. Scalable GPU graph traversal. ACM Sigplan Notices 47, 8 (2012), 117–128.
    [57]
    Daniele Micciancio and Oded Regev. 2009. Lattice-based cryptography. Post-quantum cryptography (2009), 147–191.
    [58]
    Christian Vincent Mouchet, Jean-Philippe Bossuat, Juan Ramón Troncoso-Pastoriza, and Jean-Pierre Hubaux. 2020. Lattigo: A multiparty homomorphic encryption library in go. In Proceedings of the 8th Workshop on Encrypted Computing and Applied Homomorphic Cryptography. 64–70.
    [59]
    Christian Vincent Mouchet, Jean-Philippe Bossuat, Juan Ramón Troncoso-Pastoriza, and Jean-Pierre Hubaux. 2022. Lattigo v4. Online: https://github.com/tuneinsight/lattigo. EPFL-LDS, Tune Insight SA.
    [60]
    OpenAI. 2023. March 20 CHATGPT outage: Here’s what happened. https://openai.com/blog/march-20-chatgpt-outage
    [61]
    Ali Şah Özcan, Can Ayduman, Enes Recep Türkoğlu, and Erkay Savaş. 2023. Homomorphic Encryption on GPU. IEEE Access (2023).
    [62]
    Jaiyoung Park, Donghwan Kim, and Jung Ho Ahn. 2023. HyPHEN: A Hybrid Packing Method and Optimizations for Homomorphic Encryption-Based Neural Network. (2023). https://doi.org/10.48550/arXiv.2302.02407
    [63]
    Artur Podobas, Kentaro Sano, and Satoshi Matsuoka. 2020. A survey on coarse-grained reconfigurable architectures from a performance perspective. IEEE Access 8 (2020), 146719–146743. https://doi.org/10.1109/ACCESS.2020.3012084
    [64]
    Yuriy Polyakov. [n. d.]. Palisade Library. https://gitlab.com/palisade/palisade-release
    [65]
    Thomas Pöppelmann, Tobias Oder, and Tim Güneysu. 2015. High-Performance Ideal Lattice-Based Cryptography on 8-Bit ATxmega Microcontrollers. In Progress in Cryptology—LATINCRYPT. Springer, 346–365. https://doi.org/10.1145/3092951
    [66]
    M Sadegh Riazi, Kim Laine, Blake Pelton, and Wei Dai. 2020. HEAX: An architecture for computing on encrypted data. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 1295–1309.
    [67]
    Sujoy Sinha Roy, Ahmet Can Mert, Sunmin Kwon, Youngsam Shin, Donghoon Yoo, 2021. Accelerator for computing on encrypted data. Cryptology ePrint Archive (2021).
    [68]
    Sujoy Sinha Roy, Furkan Turan, Kimmo Jarvinen, Frederik Vercauteren, and Ingrid Verbauwhede. 2019. FPGA-based high-performance parallel architecture for homomorphic computing on encrypted data. In 2019 IEEE International symposium on high performance computer architecture (HPCA). IEEE, 387–398.
    [69]
    Nikola Samardzic, Axel Feldmann, Aleksandar Krastev, Srinivas Devadas, Ronald Dreslinski, Christopher Peikert, and Daniel Sanchez. 2021. F1: A fast and programmable accelerator for fully homomorphic encryption. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. 238–252. https://doi.org/10.1145/3466752.3480070
    [70]
    Nikola Samardzic, Axel Feldmann, Aleksandar Krastev, Nathan Manohar, Nicholas Genise, Srinivas Devadas, Karim Eldefrawy, Chris Peikert, and Daniel Sanchez. 2022. Craterlake: a hardware accelerator for efficient unbounded computation on encrypted data. In Proceedings of the 49th Annual International Symposium on Computer Architecture. 173–187.
    [71]
    Mohanad Sarhan, Siamak Layeghy, Marcus Gallagher, and Marius Portmann. 2023. From zero-shot machine learning to zero-day attack detection. International Journal of Information Security (2023), 1–13. https://link.springer.com/article/10.1007/s10207-023-00676-0
    [72]
    Teven Le Scao, Thomas Wang, Daniel Hesslow, Lucile Saulnier, Stas Bekman, M Saiful Bari, Stella Bideman, Hady Elsahar, Niklas Muennighoff, Jason Phang, 2022. What Language Model to Train if You Have One Million GPU Hours?arXiv preprint arXiv:2210.15424 (2022). https://doi.org/10.48550/arXiv.2210.15424
    [73]
    SEAL 2023. Microsoft SEAL (release 4.1). https://github.com/Microsoft/SEAL. Microsoft Research, Redmond, WA.
    [74]
    Jungkyun Shin, Wansoo Ha, Hyunggu Jun, Dong-Joo Min, and Changsoo Shin. 2014. 3D Laplace-domain full waveform inversion using a single GPU card. Computers & Geosciences 67 (2014), 1–13. https://doi.org/10.1016/j.cageo.2014.02.006
    [75]
    Kaustubh Shivdikar. 2021. SMASH: Sparse Matrix Atomic Scratchpad Hashing. Ph. D. Dissertation. https://www.researchgate.net/publication/352018010_SMASH_Sparse_Matrix_Atomic_Scratchpad_Hashing Copyright - Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works; Last updated - 2023-03-07.
    [76]
    Kaustubh Shivdikar, Gilbert Jonatan, Evelio Mora, Neal Livesay, Rashmi Agrawal, Ajay Joshi, José L Abellán, John Kim, and David Kaeli. 2022. Accelerating polynomial multiplication for homomorphic encryption on GPUs. In 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED). IEEE, 61–72. https://doi.org/10.1109/SEED55351.2022.00013
    [77]
    Kaustubh Shivdikar, Ahan Kak, and Kshitij Marwah. 2015. Automatic image annotation using a hybrid engine. In 2015 Annual IEEE India Conference (INDICON). IEEE, 1–6. https://doi.org/10.1109/INDICON.2015.7443338
    [78]
    Kaustubh Shivdikar, Kaushal Paneri, and David Kaeli. [n. d.]. Speeding up DNNs using HPL based Fine-grained Tiling for Distributed Multi-GPU Training. ([n. d.]).
    [79]
    Victor Shoup. 2009. A computational introduction to number theory and algebra. Cambridge University Press. https://shoup.net/ntb/ntb-v2.pdf
    [80]
    Mohit Srinivasan, Ahan Kak, Kaustubh Shivdikar, and Chirag Warty. 2016. Dynamic power allocation using Stackelberg game in a wireless sensor network. In 2016 IEEE Aerospace Conference. IEEE, 1–10. https://doi.org/10.1109/AERO.2016.7500918
    [81]
    Yifan Sun, Trinayan Baruah, Saiful A. Mojumder, Shi Dong, Xiang Gong, Shane Treadway, Yuhui Bao, Spencer Hance, Carter McCardwell, Vincent Zhao, Harrison Barclay, Amir Kavyan Ziabari, Zhongliang Chen, Rafael Ubal, José L. Abellán, John Kim, Ajay Joshi, and David Kaeli. 2019. MGPUSim: Enabling Multi-GPU Performance Modeling and Optimization. In Proceedings of the 46th International Symposium on Computer Architecture (Phoenix, Arizona) (ISCA ’19). Association for Computing Machinery, New York, NY, USA, 197–209. https://doi.org/10.1145/3307650.3322230
    [82]
    Yifan Sun, Yixuan Zhang, Ali Mosallaei, Michael D Shah, Cody Dunne, and David Kaeli. 2021. Daisen: A Framework for Visualizing Detailed GPU Execution. Eurographics Conference on Visualization 40, 3 (2021), 239–250.
    [83]
    Swadhin Thakkar, Kaustubh Shivdikar, and Chirag Warty. 2017. Video steganography using encrypted payload for satellite communication. In 2017 IEEE Aerospace Conference. IEEE, 1–11. https://doi.org/10.1109/AERO.2017.7943978
    [84]
    Ananta Tiwari, Kristopher Keipert, Adam Jundt, Joshua Peraza, Sarom S Leang, Michael Laurenzano, Mark S Gordon, and Laura Carrington. 2015. Performance and energy efficiency analysis of 64-bit ARM using GAMESS. In Proceedings of the 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing. 1–10. https://doi.org/10.1145/2834899.2834905
    [85]
    Chris Walshaw and Mark Cross. 2001. Multilevel mesh partitioning for heterogeneous communication networks. Future generation computer systems 17, 5 (2001), 601–623. https://doi.org/10.1016/S0167-739X(00)00107-2
    [86]
    Lei Xiao, Guoxiang Yang, Kunyang Zhao, and Gang Mei. 2019. Efficient parallel algorithms for 3D Laplacian smoothing on the GPU. Applied Sciences 9, 24 (2019), 5437. https://doi.org/10.3390/app9245437
    [87]
    Runhua Xu, Nathalie Baracaldo, and James Joshi. 2021. Privacy-preserving machine learning: Methods, challenges and directions. arXiv preprint arXiv:2108.04417 (2021). https://doi.org/10.48550/arXiv.2108.04417
    [88]
    Tian Ye, Rajgopal Kannan, and Viktor K Prasanna. 2022. FPGA Acceleration of Fully Homomorphic Encryption over the Torus. In 2022 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1–7.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture
    October 2023
    1528 pages
    ISBN:9798400703294
    DOI:10.1145/3613424
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 December 2023

    Check for updates

    Author Tags

    1. CU-side interconnects
    2. Custom accelerators
    3. Fully Homomorphic Encryption (FHE)
    4. Modular reduction
    5. Zero-trust frameworks

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    MICRO '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 484 of 2,242 submissions, 22%

    Upcoming Conference

    MICRO '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 345
      Total Downloads
    • Downloads (Last 12 months)345
    • Downloads (Last 6 weeks)57

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media