Por favor, use este identificador para citar o enlazar este ítem:
https://doi.org/10.1145/3622781.3674176


Título: | Bounding speculative execution of atomic regions to a single retry |
Fecha de publicación: | 1-may-2024 |
Editorial: | Association for Computing Machinery |
ISBN: | 979-8-4007-0391-1 |
Resumen: | Mutual exclusion has long served as a fundamental construct in parallel programs. Despite a long history of optimizing the lower-level lock and unlock operations used to enforce mutual exclusion, such operations largely dictate performance in parallel programs. Speculative Lock Elision, and more generally Hardware Transactional Memory, allow executing atomic regions (ARs) concurrently and speculatively, and ensure correctness by using conflict detection. However, practical implementations of these ideas are best-effort and, in case of conflicts, the execution of ARs is retried a predetermined number of times before falling back to mutual exclusion. This work explores the opportunities of using cacheline locking to bound the number of retries of speculative solutions. Our key insight is that ARs that access exactly the same set of addresses when re-executing can learn that set in the first execution and execute non-speculatively in the next one by performing an ordered cacheline locking. This way the speculative execution is bounded to a single retry. We first establish the conditions for ARs to be able to re-execute under a cacheline-locked mode. Based on these conditions, we propose cleAR, cacheline-locked executed AR, a novel technique that on the first abort, forces the reexecution to use cacheline locking. The detection and conversion to cacheline-locking mode is transparent to software. Using gem5 running data-structure benchmarks and the STAMP benchmark suite, we show that the average number of ARs that succeed on the first retry grows from 35.4% in our baseline to 64.4% with cleAR, reducing the percentage of fallback (coarse-grain mutual exclusion) execution from 37.2% to 15.4%. These improvements reduce average execution time by 35.0% over a baseline configuration and by 23.3% over more elaborated approaches like PowerTM. |
Autor/es principal/es: | Gómez Hernández, Eduardo José Cebrián, Juan M. Kaxiras, Stefanos Ros Bardisa, Alberto |
Forma parte de: | ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Vol. 4, pp. 17-30 |
Versión del editor: | https://dl.acm.org/doi/10.1145/3622781.3674176 |
URI: | http://hdl.handle.net/10201/154740 |
DOI: | https://doi.org/10.1145/3622781.3674176 |
Tipo de documento: | info:eu-repo/semantics/article |
Número páginas / Extensión: | 14 |
Derechos: | info:eu-repo/semantics/openAccess Attribution-NonCommercial-NoDerivatives 4.0 Internacional |
Descripción: | © 2024 Copyright is held by the owner/author(s). This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/ This document is the Published Manuscript version of a Published Work that appeared in final form in ASPLOS '24. To access the final edited and published work see https://doi.org/10.1145/3622781.3674176 |
Aparece en las colecciones: | Artículos |
Ficheros en este ítem:
Fichero | Descripción | Tamaño | Formato | |
---|---|---|---|---|
ejgomez-asplos24.pdf | 719,08 kB | Adobe PDF | ![]() Visualizar/Abrir |
Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons