Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10201/135369

Título: CELLO: Compiler-Assisted Efficient Load-Load Ordering in Data-Race-Free Regions
Fecha de defensa / creación: 2023
Resumen: Efficient Total Store Order (TSO) implementations allow loads to execute speculatively out-of-order. To detect order violations, the load queue (LQ) holds all the in-flight loads and is searched on every invalidation and cache eviction. Moreover, in a simultaneous multithreading processor (SMT), stores also search the LQ when writing to cache. LQ searches entail considerable energy consumption. Furthermore, the processor stalls upon encountering the LQ full or when its ports are busy. Hence, the LQ is a critical structure in terms of both energy and performance. In this work, we observe that the use of the LQ could be dramatically optimized under the guarantees of the datarace-free (DRF) property imposed by modern programming languages. To leverage this observation, we propose CELLO, a software-hardware co-design in which the compiler detects memory operations in DRF regions and the hardware optimizes their execution by safely skipping LQ searches without violating the TSO consistency model. Furthermore, CELLO allows removing DRF loads from the LQ earlier, as they do not need to be searched to detect consistency violations. With minimal hardware overhead, we show that an 8-core 2- way SMT processor with CELLO avoids almost all conservative searches to the LQ and significantly reduces its occupancy. CELLO allows i) to reduce the LQ energy expenditure by 33% on average (up to 53%) while performing 2.8% better on average (up to 18.6%) than the baseline system, and ii) to shrink the LQ size from 192 to only 80 entries, reducing the LQ energy expenditure as much as 69% while performing on par with a mainstream LQ implementation.
Autor/es principal/es: Singh, Sawan
Feliu, Josue
Acacio, Manuel E.
Jimborean, Alexandra
Ros, Alberto
Facultad/Departamentos/Servicios: Facultades, Departamentos, Servicios y Escuelas::Departamentos de la UMU::Ingeniería y Tecnología de Computadores
Forma parte de: Es parte de 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)
URI: http://hdl.handle.net/10201/135369
Tipo de documento: info:eu-repo/semantics/article
Número páginas / Extensión: 13
Derechos: info:eu-repo/semantics/openAccess
Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Aparece en las colecciones:Artículos: Ingeniería y Tecnología de Computadores

Ficheros en este ítem:
Fichero Descripción TamañoFormato 
CELLO Compiler Assisted Efficient.pdf1,78 MBAdobe PDFVista previa
Visualizar/Abrir


Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons Creative Commons