Efficient, Distributed, and Non-Speculative Multi-Address Atomic Operations

Gómez-Hernández, Eduardo José; Titos-Gil, Rubén; Cebrián González, Juan Manuel; Kaxiras, Stefanos; Ros, Alberto

Por favor, use este identificador para citar o enlazar este ítem: 10.1145/3466752.3480073

RefMan EndNote BibTex RefWorks Excel CSV PDF Mendeley

Título:	Efficient, Distributed, and Non-Speculative Multi-Address Atomic Operations
Fecha de publicación:	oct-2021
ISBN:	978-1-4503-8557-2
Palabras clave:	Mutli-address atomics Read-Modity-Write Deadlock-free Non-speculative
Resumen:	Critical sections that read, modify, and write (RMW) a small set of addresses are common in parallel applications and concurrent data structures. However, to escape from the intricacies of fine-grained locks, which require reasoning about all possible thread interleavings, programmers often resort to coarse-grained locks to ensure atomicity. This results in atomic protection of a much larger set of potentially conflicting addresses, and, consequently, increased lock contention and unneeded serialization. As many before us have observed, these problems would be solved if only general RMW multi-address atomic operations were available, but current proposals are impractical because of deadlock scenarios that appear due to resource limitations. Alternatively, transactional memory can detect conflicts at run-time aiming to maximize concurrency, but it has significant overheads in highly-contended critical sections. In this work, we propose multi-address atomic operations (MAD atomics). MAD atomics achieve complexity-effective, non-speculative, non-deadlocking, fine-grained locking for multiple addresses, relying solely on the coherence protocol and a predetermined locking order. Unlike prior works, MAD atomics address the challenge of enabling atomic modification over a set of cachelines with arbitrary addresses, simultaneously locking all of them while sidestepping deadlock. MAD atomics only require a small storage per core (around 68 bytes), while significantly outperforming typical lock implementations. Indeed, our evaluation using gem5-20 shows that MAD atomics can improve performance by up to 18×(3.4×, on average, for the applications and concurrent data structures evaluated in this work) over a baseline implemented with locks running on 16 cores. More importantly, the improvement still reaches 2.7×, on average, compared to an Intel hardware transactional memory implementation running on 16 cores.
Autor/es principal/es:	Gómez-Hernández, Eduardo José Titos-Gil, Rubén Cebrián González, Juan Manuel Kaxiras, Stefanos Ros, Alberto
Forma parte de:	54th International Symposium on Microarchitecture (MICRO)
URI:	http://hdl.handle.net/10201/114646
DOI:	10.1145/3466752.3480073
Tipo de documento:	info:eu-repo/semantics/article
Número páginas / Extensión:	13
Derechos:	info:eu-repo/semantics/openAccess Atribución 4.0 Internacional
Aparece en las colecciones:	Artículos

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
ejgomez-micro21.pdf		773,79 kB	Adobe PDF	Visualizar/Abrir

Mostrar el registro Dublin Core completo del ítem Mostrar el registro PREMIS del ítem Estadísticas

Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons