Static Instruction Scheduling for High Performance on Limited Hardware

Tran, Kim-Anh; Carlson, Trevor E.; Koukos, Konstantinos; Själander, Magnus; Spiliopoulos, Vasileios; Kaxiras, Stefanos; Jimborean, Alexandra

Por favor, use este identificador para citar o enlazar este ítem: https://doi.org/10.1109/TC.2017.2769641

RefMan EndNote BibTex RefWorks Excel CSV PDF Mendeley

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.author	Tran, Kim-Anh	-
dc.contributor.author	Carlson, Trevor E.	-
dc.contributor.author	Koukos, Konstantinos	-
dc.contributor.author	Själander, Magnus	-
dc.contributor.author	Spiliopoulos, Vasileios	-
dc.contributor.author	Kaxiras, Stefanos	-
dc.contributor.author	Jimborean, Alexandra	-
dc.date.accessioned	2024-02-06T13:16:34Z	-
dc.date.available	2024-02-06T13:16:34Z	-
dc.date.issued	2018-04-01	-
dc.identifier.citation	IEEE Transactions on Computers, vol. 67, no. 4, pp. 513-527	-
dc.identifier.issn	Print: 0018-9340	-
dc.identifier.issn	Electronic: 1557-9956	-
dc.identifier.uri	http://hdl.handle.net/10201/138783	-
dc.description	© 2018. IEEE. This document is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0 This document is the accepted version of a published work that appeared in final form in IEEE Transactions on Computers, vol. 67, no. 4, pp. 513-527 To access the final work, see DOI:, https://doi.org/10.1109/TC.2017.2769641	-
dc.description.abstract	Complex out-of-order (OoO) processors have been designed to overcome the restrictions of outstanding long-latency misses at the cost of increased energy consumption. Simple, limited OoO processors are a compromise in terms of energy consumption and performance, as they have fewer hardware resources to tolerate the penalties of long-latency loads. In worst case, these loads may stall the processor entirely. We present Clairvoyance, a compiler based technique that generates code able to hide memory latency and better utilize simple OoO processors. By clustering loads found across basic block boundaries, Clairvoyance overlaps the outstanding latencies to increases memory-level parallelism. We show that these simple OoO processors, equipped with the appropriate compiler support, can effectively hide long-latency loads and achieve performance improvements for memory-bound applications. To this end, Clairvoyance tackles (i) statically unknown dependencies, (ii) insufficient independent instructions, and (iii) register pressure. Clairvoyance achieves a geomean execution time improvement of 14 percent for memory-bound applications, on top of standard O3 optimizations, while maintaining compute-bound applications' high-performance.	es
dc.format	application/pdf	es
dc.format.extent	15	es
dc.language	eng	es
dc.publisher	IEEE	es
dc.relation	This work is supported, in part, by the Swedish ResearchCouncil UPMARC Linnaeus Centre and by the Swedish VR(grant no. 2016-05086)	es
dc.rights	info:eu-repo/semantics/openAccess	es
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	Compilers	es
dc.subject	Code generation	es
dc.subject	Memory management	es
dc.subject	Optimization	es
dc.title	Static Instruction Scheduling for High Performance on Limited Hardware	es
dc.type	info:eu-repo/semantics/article	es
dc.relation.publisherversion	https://ieeexplore.ieee.org/document/8094900	-
dc.identifier.doi	https://doi.org/10.1109/TC.2017.2769641	-
dc.contributor.department	Departamento de Ingeniería y Tecnología de Computadores	-
Aparece en las colecciones:	Artículos

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
2018_TC_Kim_Anh_Tran.pdf		1,45 MB	Adobe PDF	Visualizar/Abrir

Mostrar el registro sencillo del ítem Mostrar el registro PREMIS del ítem Estadísticas

Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons