Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10201/135369

Registro completo de metadatos
Campo DCValorLengua/Idioma
dc.contributor.authorSingh, Sawan-
dc.contributor.authorFeliu, Josue-
dc.contributor.authorAcacio, Manuel E.-
dc.contributor.authorJimborean, Alexandra-
dc.contributor.authorRos, Alberto-
dc.date.accessioned2023-11-06T13:28:24Z-
dc.date.available2023-11-06T13:28:24Z-
dc.date.created2023-
dc.identifier.urihttp://hdl.handle.net/10201/135369-
dc.description.abstractEfficient Total Store Order (TSO) implementations allow loads to execute speculatively out-of-order. To detect order violations, the load queue (LQ) holds all the in-flight loads and is searched on every invalidation and cache eviction. Moreover, in a simultaneous multithreading processor (SMT), stores also search the LQ when writing to cache. LQ searches entail considerable energy consumption. Furthermore, the processor stalls upon encountering the LQ full or when its ports are busy. Hence, the LQ is a critical structure in terms of both energy and performance. In this work, we observe that the use of the LQ could be dramatically optimized under the guarantees of the datarace-free (DRF) property imposed by modern programming languages. To leverage this observation, we propose CELLO, a software-hardware co-design in which the compiler detects memory operations in DRF regions and the hardware optimizes their execution by safely skipping LQ searches without violating the TSO consistency model. Furthermore, CELLO allows removing DRF loads from the LQ earlier, as they do not need to be searched to detect consistency violations. With minimal hardware overhead, we show that an 8-core 2- way SMT processor with CELLO avoids almost all conservative searches to the LQ and significantly reduces its occupancy. CELLO allows i) to reduce the LQ energy expenditure by 33% on average (up to 53%) while performing 2.8% better on average (up to 18.6%) than the baseline system, and ii) to shrink the LQ size from 192 to only 80 entries, reducing the LQ energy expenditure as much as 69% while performing on par with a mainstream LQ implementation.es
dc.formatapplication/pdfes
dc.format.extent13es
dc.languageenges
dc.relationThis project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No 819134), from the MCIN/AEI/10.13039/501100011033/ and the “ERDF A way of making Europe”, EU (grant PID2022- 136315OB-I00), from the European Union’s Horizon 2021 research and innovation program (grant agreement No 101070374 under HORIZON-CL4-2021-DIGITALEMERGING-01) and RYC2018-025200-I, and from the MCIN/AEI/10.13039/501100011033/ and the European Union NextGenerationEU/PRTR (grants TED2021-130233BC33/C32 and RYC2021-030862-I).es
dc.relation.ispartofEs parte de 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)es
dc.rightsinfo:eu-repo/semantics/openAccesses
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.titleCELLO: Compiler-Assisted Efficient Load-Load Ordering in Data-Race-Free Regionses
dc.typeinfo:eu-repo/semantics/articlees
dc.contributor.departmentIngeniería y Tecnología de Computadores-
Aparece en las colecciones:Artículos

Ficheros en este ítem:
Fichero Descripción TamañoFormato 
CELLO Compiler Assisted Efficient.pdf1,78 MBAdobe PDFVista previa
Visualizar/Abrir


Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons Creative Commons