Por favor, use este identificador para citar o enlazar este ítem:
10.1145/3466752.3480073
Twittear
Título: | Efficient, Distributed, and Non-Speculative Multi-Address Atomic Operations |
Fecha de publicación: | oct-2021 |
ISBN: | 978-1-4503-8557-2 |
Palabras clave: | Mutli-address atomics Read-Modity-Write Deadlock-free Non-speculative |
Resumen: | Critical sections that read, modify, and write (RMW) a small set of addresses are common in parallel applications and concurrent data structures. However, to escape from the intricacies of fine-grained locks, which require reasoning about all possible thread interleavings, programmers often resort to coarse-grained locks to ensure atomicity. This results in atomic protection of a much larger set of potentially conflicting addresses, and, consequently, increased lock contention and unneeded serialization. As many before us have observed, these problems would be solved if only general RMW multi-address atomic operations were available, but current proposals are impractical because of deadlock scenarios that appear due to resource limitations. Alternatively, transactional memory can detect conflicts at run-time aiming to maximize concurrency, but it has significant overheads in highly-contended critical sections. In this work, we propose multi-address atomic operations (MAD atomics). MAD atomics achieve complexity-effective, non-speculative, non-deadlocking, fine-grained locking for multiple addresses, relying solely on the coherence protocol and a predetermined locking order. Unlike prior works, MAD atomics address the challenge of enabling atomic modification over a set of cachelines with arbitrary addresses, simultaneously locking all of them while sidestepping deadlock. MAD atomics only require a small storage per core (around 68 bytes), while significantly outperforming typical lock implementations. Indeed, our evaluation using gem5-20 shows that MAD atomics can improve performance by up to 18×(3.4×, on average, for the applications and concurrent data structures evaluated in this work) over a baseline implemented with locks running on 16 cores. More importantly, the improvement still reaches 2.7×, on average, compared to an Intel hardware transactional memory implementation running on 16 cores. |
Autor/es principal/es: | Gómez-Hernández, Eduardo José Titos-Gil, Rubén Cebrián González, Juan Manuel Kaxiras, Stefanos Ros, Alberto |
Facultad/Departamentos/Servicios: | Facultades, Departamentos, Servicios y Escuelas::Departamentos de la UMU::Ingeniería y Tecnología de Computadores |
Forma parte de: | 54th International Symposium on Microarchitecture (MICRO) |
URI: | http://hdl.handle.net/10201/114646 |
DOI: | 10.1145/3466752.3480073 |
Tipo de documento: | info:eu-repo/semantics/article |
Número páginas / Extensión: | 13 |
Derechos: | info:eu-repo/semantics/openAccess Atribución 4.0 Internacional |
Aparece en las colecciones: | Artículos: Ingeniería y Tecnología de Computadores |
Ficheros en este ítem:
Fichero | Descripción | Tamaño | Formato | |
---|---|---|---|---|
ejgomez-micro21.pdf | 773,79 kB | Adobe PDF | Visualizar/Abrir |
Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons