Publication: Bounding speculative execution of atomic regions to a single retry
Authors
Gómez Hernández, Eduardo José ; Cebrián, Juan M. ; Kaxiras, Stefanos ; Ros Bardisa, Alberto
item.page.secondaryauthor
item.page.director
Publisher
Association for Computing Machinery
publication.page.editor
publication.page.department
DOI
https://doi.org/10.1145/3622781.3674176
item.page.type
info:eu-repo/semantics/article
Description
© 2024 Copyright is held by the owner/author(s). This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
This document is the Published Manuscript version of a Published Work that appeared in final form in ASPLOS '24. To access the final edited and published work see https://doi.org/10.1145/3622781.3674176
Abstract
Mutual exclusion has long served as a fundamental construct
in parallel programs. Despite a long history of optimizing
the lower-level lock and unlock operations used to enforce
mutual exclusion, such operations largely dictate performance in parallel programs. Speculative Lock Elision, and
more generally Hardware Transactional Memory, allow executing atomic regions (ARs) concurrently and speculatively,
and ensure correctness by using conflict detection. However,
practical implementations of these ideas are best-effort and,
in case of conflicts, the execution of ARs is retried a predetermined number of times before falling back to mutual
exclusion.
This work explores the opportunities of using cacheline
locking to bound the number of retries of speculative solutions. Our key insight is that ARs that access exactly the
same set of addresses when re-executing can learn that set
in the first execution and execute non-speculatively in the
next one by performing an ordered cacheline locking. This
way the speculative execution is bounded to a single retry.
We first establish the conditions for ARs to be able to
re-execute under a cacheline-locked mode. Based on these
conditions, we propose cleAR, cacheline-locked executed
AR, a novel technique that on the first abort, forces the reexecution to use cacheline locking. The detection and conversion to cacheline-locking mode is transparent to software. Using gem5 running data-structure benchmarks and the
STAMP benchmark suite, we show that the average number
of ARs that succeed on the first retry grows from 35.4% in
our baseline to 64.4% with cleAR, reducing the percentage
of fallback (coarse-grain mutual exclusion) execution from
37.2% to 15.4%. These improvements reduce average execution time by 35.0% over a baseline configuration and by 23.3%
over more elaborated approaches like PowerTM.
publication.page.subject
Citation
item.page.embargo
Collections
Ir a Estadísticas
Este ítem está sujeto a una licencia Creative Commons. http://creativecommons.org/licenses/by-nc-nd/4.0/