Boosting Store Buffer Efficiency with Store-Prefetch Bursts

Cebrián González, Juan Manuel; Kaxiras, Stefanos; Ros, Alberto

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10201/106144

RefMan EndNote BibTex RefWorks Excel CSV PDF Mendeley

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.author	Cebrián González, Juan Manuel	-
dc.contributor.author	Kaxiras, Stefanos	-
dc.contributor.author	Ros, Alberto	-
dc.date.accessioned	2021-04-08T21:40:53Z	-
dc.date.available	2021-04-08T21:40:53Z	-
dc.date.issued	2020-10	-
dc.identifier.uri	http://hdl.handle.net/10201/106144	-
dc.description.abstract	Virtually all processors today employ a store buffer (SB) to hide store latency. However, when the store buffer is full, store latency is exposed to the processor causing pipeline stalls.The default strategies to mitigate these stalls are to issue prefetch for ownership requests when store instructions commit and to continuously increase the store buffer size. While these strategies considerably increase memory-level parallelism for stores, there are still applications that suffer deeply from stalls caused bythe store buffer. Even worse, store-buffer induced stalls increase considerably when simultaneous multi-threading is enabled, as the store buffer is statically partitioned among the threads. In this paper, we propose a highly selective and very aggressive prefetching strategy to minimize store-buffer induced stalls. Our proposal, Store-Prefetch Burst (SPB), is based on the follow inginsights: i) the majority of store-buffer induced stalls are caused by a few stores; ii) the access pattern of such stores are easily predictable; and iii) the latency of the stores is not commonly hidden by standard cache prefetchers, as hiding their latency would require tremendous prefetch aggressiveness. SPB accurately detects contiguous store-access patterns (requiring just 67 bits of storage) and prefetches the remaining memory blocks of the accessed page in a single burst request to the L1 controller. SPB matches the performance of a 1024-entry SB implementation on a 56-entry SB (i.e., Skylake architecture). For a 14-entry SB (e.g., running four logical cores), it achieves 95.0% of that ideal performance, on average, for SPEC CPU 2017. Additionally, a 20-entry store buffer that incorporates SPB achieves the average performance of a standard 56-entry store buffer.	es
dc.format	application/pdf	es
dc.language	eng	es
dc.relation	European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (ECHO: Extending Coherence for Hardware-Driven Optimizations in Multicore Architectures, grant agreement No 819134, Consolidator Grant, 2018).	es
dc.relation.ispartof	53rd International Symposium on Microarchitecture (MICRO)	es
dc.rights	info:eu-repo/semantics/openAccess	es
dc.rights	Atribución-NoComercial 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/	*
dc.title	Boosting Store Buffer Efficiency with Store-Prefetch Bursts	es
dc.type	info:eu-repo/semantics/article	es
dc.type	info:eu-repo/semantics/lecture	es
dc.contributor.department	Departamento de Ingeniería y Tecnología de Computadores	-
Aparece en las colecciones:	Artículos

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
jcebrian-micro20.pdf		578,53 kB	Adobe PDF	Visualizar/Abrir

Mostrar el registro sencillo del ítem Mostrar el registro PREMIS del ítem Estadísticas

Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons