ITSLF: Inter-Thread Store-to-Load Forwarding in Simultaneous Multithreading

Feliú, Josué; Ros, Alberto; Acacio Sánchez, Manuel Eugenio; Kaxiras, Stefanos

Por favor, use este identificador para citar o enlazar este ítem: 10.1145/3466752.3480086

RefMan EndNote BibTex RefWorks Excel CSV PDF Mendeley

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.author	Feliú, Josué	-
dc.contributor.author	Ros, Alberto	-
dc.contributor.author	Acacio Sánchez, Manuel Eugenio	-
dc.contributor.author	Kaxiras, Stefanos	-
dc.contributor.other	Facultades, Departamentos, Servicios y Escuelas::Departamentos de la UMU::Ingeniería y Tecnología de Computadores	es
dc.date.accessioned	2021-12-02T20:11:24Z	-
dc.date.available	2021-12-02T20:11:24Z	-
dc.date.issued	2021-10	-
dc.identifier.isbn	978-1-4503-8557-2	-
dc.identifier.uri	http://hdl.handle.net/10201/114645	-
dc.description.abstract	In this paper, we argue that, for a class of fine-grain, synchronization-intensive, parallel workloads, it is advantageous to consolidate synchronization and communication as much as possible among the threads of simultaneous multithreading (SMT) cores. While, today, the shared L1 is the closest coherent level where synchronization and communication between SMT threads can take place, we observe that there is an even closer shared level, entirely inside a single core. This level comprises the load queues (LQ) and store queues (SQ) / store buffers (SB) of the SMT threads and to the best of our knowledge it has never been used as such. The reason is that if we allow communication of different SMT threads via their LQs and SQs/SBs, i.e., inter-thread store-to-load forwarding (ITSLF), we violate write atomicity with respect to the outside world, beyond the acceptable model of read-own-write-early multiple-copy atomicity (rMCA). The key insight of our work is that we can accelerate synchronization and communication among SMT threads with inter-thread store-to-load forwarding, without affecting the memory model—in particular without violating rMCA. We demonstrate how we can achieve this entirely through speculative interactions between LQs and SQs/SBs of different threads, while ensuring deadlock-free execution. Without changing the architectural model, the ISA, or the software, and without adding extra hardware in the form of a specialized accelerator, our insight enables a new design point for a standard architecture. We demonstrate that with ITSLF, workloads scale better on a single 8-way SMT core (with the resources of a single-threaded core) than on a baseline SMT (with or without optimizations), or on 8 single-threaded cores.	es
dc.format	application/pdf	es
dc.format.extent	13	es
dc.language	eng	es
dc.relation	European Research Council (ERC) under the European Union s Horizon 2020 research and innovation programme (ECHO: Extending Coherence for Hardware-Driven Optimizations in Multicore Architectures, grant agreement No 819134, Consolidator Grant, 2018).	es
dc.relation.ispartof	54th International Symposium on Microarchitecture (MICRO)	es
dc.rights	info:eu-repo/semantics/openAccess	es
dc.rights	Atribución 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	*
dc.subject	Simultaneous Multithreading	es
dc.subject	Store-to-Load Forwarding	es
dc.subject	Multiple-Copy Atomicity	es
dc.title	ITSLF: Inter-Thread Store-to-Load Forwarding in Simultaneous Multithreading	es
dc.type	info:eu-repo/semantics/article	es
dc.identifier.doi	10.1145/3466752.3480086	-
Aparece en las colecciones:	Artículos: Ingeniería y Tecnología de Computadores

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
jfeliu-micro21.pdf		1,31 MB	Adobe PDF	Visualizar/Abrir

Mostrar el registro sencillo del ítem Mostrar el registro PREMIS del ítem Estadísticas

Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons