Por favor, use este identificador para citar o enlazar este ítem: 10.1145/3466752.3480073

Título: Efficient, Distributed, and Non-Speculative Multi-Address Atomic Operations
Fecha de publicación: oct-2021
ISBN: 978-1-4503-8557-2
Palabras clave: Mutli-address atomics
Read-Modity-Write
Deadlock-free
Non-speculative
Resumen: Critical sections that read, modify, and write (RMW) a small set of addresses are common in parallel applications and concurrent data structures. However, to escape from the intricacies of fine-grained locks, which require reasoning about all possible thread interleavings, programmers often resort to coarse-grained locks to ensure atomicity. This results in atomic protection of a much larger set of potentially conflicting addresses, and, consequently, increased lock contention and unneeded serialization. As many before us have observed, these problems would be solved if only general RMW multi-address atomic operations were available, but current proposals are impractical because of deadlock scenarios that appear due to resource limitations. Alternatively, transactional memory can detect conflicts at run-time aiming to maximize concurrency, but it has significant overheads in highly-contended critical sections. In this work, we propose multi-address atomic operations (MAD atomics). MAD atomics achieve complexity-effective, non-speculative, non-deadlocking, fine-grained locking for multiple addresses, relying solely on the coherence protocol and a predetermined locking order. Unlike prior works, MAD atomics address the challenge of enabling atomic modification over a set of cachelines with arbitrary addresses, simultaneously locking all of them while sidestepping deadlock. MAD atomics only require a small storage per core (around 68 bytes), while significantly outperforming typical lock implementations. Indeed, our evaluation using gem5-20 shows that MAD atomics can improve performance by up to 18×(3.4×, on average, for the applications and concurrent data structures evaluated in this work) over a baseline implemented with locks running on 16 cores. More importantly, the improvement still reaches 2.7×, on average, compared to an Intel hardware transactional memory implementation running on 16 cores.
Autor/es principal/es: Gómez-Hernández, Eduardo José
Titos-Gil, Rubén
Cebrián González, Juan Manuel
Kaxiras, Stefanos
Ros, Alberto
Facultad/Departamentos/Servicios: Facultades, Departamentos, Servicios y Escuelas::Departamentos de la UMU::Ingeniería y Tecnología de Computadores
Forma parte de: 54th International Symposium on Microarchitecture (MICRO)
URI: http://hdl.handle.net/10201/114646
DOI: 10.1145/3466752.3480073
Tipo de documento: info:eu-repo/semantics/article
Número páginas / Extensión: 13
Derechos: info:eu-repo/semantics/openAccess
Atribución 4.0 Internacional
Aparece en las colecciones:Artículos: Ingeniería y Tecnología de Computadores

Ficheros en este ítem:
Fichero Descripción TamañoFormato 
ejgomez-micro21.pdf773,79 kBAdobe PDFVista previa
Visualizar/Abrir


Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons Creative Commons