CELLO: Compiler-Assisted Efficient Load-Load
Ordering in Data-Race-Free Regions

Singh, Sawan; Feliu, Josue; Acacio, Manuel E.; Jimborean, Alexandra; Ros, Alberto

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10201/135369

RefMan EndNote BibTex RefWorks Excel CSV PDF Mendeley

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.author	Singh, Sawan	-
dc.contributor.author	Feliu, Josue	-
dc.contributor.author	Acacio, Manuel E.	-
dc.contributor.author	Jimborean, Alexandra	-
dc.contributor.author	Ros, Alberto	-
dc.date.accessioned	2023-11-06T13:28:24Z	-
dc.date.available	2023-11-06T13:28:24Z	-
dc.date.created	2023	-
dc.identifier.uri	http://hdl.handle.net/10201/135369	-
dc.description.abstract	Efficient Total Store Order (TSO) implementations allow loads to execute speculatively out-of-order. To detect order violations, the load queue (LQ) holds all the in-flight loads and is searched on every invalidation and cache eviction. Moreover, in a simultaneous multithreading processor (SMT), stores also search the LQ when writing to cache. LQ searches entail considerable energy consumption. Furthermore, the processor stalls upon encountering the LQ full or when its ports are busy. Hence, the LQ is a critical structure in terms of both energy and performance. In this work, we observe that the use of the LQ could be dramatically optimized under the guarantees of the datarace-free (DRF) property imposed by modern programming languages. To leverage this observation, we propose CELLO, a software-hardware co-design in which the compiler detects memory operations in DRF regions and the hardware optimizes their execution by safely skipping LQ searches without violating the TSO consistency model. Furthermore, CELLO allows removing DRF loads from the LQ earlier, as they do not need to be searched to detect consistency violations. With minimal hardware overhead, we show that an 8-core 2- way SMT processor with CELLO avoids almost all conservative searches to the LQ and significantly reduces its occupancy. CELLO allows i) to reduce the LQ energy expenditure by 33% on average (up to 53%) while performing 2.8% better on average (up to 18.6%) than the baseline system, and ii) to shrink the LQ size from 192 to only 80 entries, reducing the LQ energy expenditure as much as 69% while performing on par with a mainstream LQ implementation.	es
dc.format	application/pdf	es
dc.format.extent	13	es
dc.language	eng	es
dc.relation	This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No 819134), from the MCIN/AEI/10.13039/501100011033/ and the “ERDF A way of making Europe”, EU (grant PID2022- 136315OB-I00), from the European Union’s Horizon 2021 research and innovation program (grant agreement No 101070374 under HORIZON-CL4-2021-DIGITALEMERGING-01) and RYC2018-025200-I, and from the MCIN/AEI/10.13039/501100011033/ and the European Union NextGenerationEU/PRTR (grants TED2021-130233BC33/C32 and RYC2021-030862-I).	es
dc.relation.ispartof	Es parte de 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)	es
dc.rights	info:eu-repo/semantics/openAccess	es
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.title	CELLO: Compiler-Assisted Efficient Load-Load Ordering in Data-Race-Free Regions	es
dc.type	info:eu-repo/semantics/article	es
dc.contributor.department	Ingeniería y Tecnología de Computadores	-
Aparece en las colecciones:	Artículos

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
CELLO Compiler Assisted Efficient.pdf		1,78 MB	Adobe PDF	Visualizar/Abrir

Mostrar el registro sencillo del ítem Mostrar el registro PREMIS del ítem Estadísticas

Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons