Por favor, use este identificador para citar o enlazar este ítem: https://doi.org/10.1145/3613424.3614279

Título: GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption
Fecha de publicación: 2023
Fecha de defensa / creación: 2023
Editorial: ACM
Association for Computing Machinery
Cita bibliográfica: MICRO '23: 56th Annual IEEE/ACM International Symposium on Microarchitecture
ISBN: 979-8-4007-0329-4/23/10
Palabras clave: Zero-trust frameworks
Fully Homomorphic Encryption (FHE)
Custom accelerators
CU-side interconnects
Modular reduction
Resumen: Fully Homomorphic Encryption (FHE) enables the processing of encrypted data without decrypting it. FHE has garnered significant attention over the past decade as it supports secure outsourcing of data processing to remote cloud services. Despite its promise of strong data privacy and security guarantees, FHE introduces a slowdown of up to five orders of magnitude as compared to the same computation using plaintext data. This overhead is presently a major barrier to the commercial adoption of FHE. While prior efforts recommend moving to custom accelerators to accelerate FHE computing, these solutions lack cost-effectiveness and scalability. In this work, we leverage GPUs to accelerate FHE, capitalizing on a well-established GPU ecosystem that is available in the cloud. We propose GME, which combines three key microarchitectural extensions along with a compile-time optimization to the current AMD CDNA GPU architecture. First, GME integrates a lightweight on-chip compute unit (CU)-side hierarchical interconnect to retain ciphertext in cache across FHE kernels, thus eliminating redundant memory transactions and improving performance. Second, to tackle compute bottlenecks, GME introduces special MOD-units that provide native custom hardware support for modular reduction operations, one of the most commonly executed sets of operations in FHE. Third, by integrating the MOD-unit with our novel pipelined 64-bit integer arithmetic cores (WMAC-units), GME further accelerates FHE workloads by 19%. Finally, we propose a Locality-Aware Block Scheduler (LABS) that improves FHE workload performance, exploiting the temporal locality available in FHE primitive blocks. Incorporating these microarchitectural features and compiler optimizations, we create a synergistic approach achieving average speedups of 796×, 14.2×, and 2.3× over Intel Xeon CPU, NVIDIA V100 GPU, and Xilinx FPGA implementations, respectively.
Autor/es principal/es: Shivdikar, Kaustubh
Agrawal, Rashmi
Jonatan, Gilbert
Abellán, José L.
Livesay, Neal
Joshi, Ajay
Bao, Yuhui
Shen, Michael
Evelio, Mora
Kim, John
Ingare, Alexander
David Kaeli
Facultad/Departamentos/Servicios: Facultades, Departamentos, Servicios y Escuelas::Departamentos de la UMU::Ingeniería y Tecnología de Computadores
Forma parte de: MICRO ’23, October 28-November 1, 2023, Toronto, ON, Canada
URI: http://hdl.handle.net/10201/134004
DOI: https://doi.org/10.1145/3613424.3614279
Tipo de documento: info:eu-repo/semantics/article
Número páginas / Extensión: 14
Derechos: info:eu-repo/semantics/openAccess
Atribución 4.0 Internacional
Descripción: © 2023. The authors. This document is made available under the CC-BY 4.0 license http://creativecommons.org/licenses/by /4.0/ This document is the published version of a published work that appeared in final form in MICRO '23: 56th Annual IEEE/ACM International Symposium on Microarchitecture. To access the final work, see DOI: https://doi.org/10.1145/3613424.3614279
Aparece en las colecciones:Artículos: Ingeniería y Tecnología de Computadores

Ficheros en este ítem:
Fichero Descripción TamañoFormato 
GME_MICRO_2023_Camera_Ready.pdf1,09 MBAdobe PDFVista previa
Visualizar/Abrir


Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons Creative Commons