SWOOP: Software-Hardware Co-design for Non-speculative, Execute-Ahead, In-Order Cores

Tran, Kim-Anh; Jimborean, Alexandra; Carlson, Trevor E.; Koukos, Konstantinos; Själander, Magnus; Kaxiras, Stefanos

Por favor, use este identificador para citar o enlazar este ítem: https://dl.acm.org/doi/10.1145/3192366.3192393

RefMan EndNote BibTex RefWorks Excel CSV PDF Mendeley

Título:	SWOOP: Software-Hardware Co-design for Non-speculative, Execute-Ahead, In-Order Cores
Fecha de publicación:	11-jun-2018
Editorial:	ACM
ISBN:	978-1-4503-5698-5
Palabras clave:	Hardware-software co-design Compilers Memory level parallelism
Resumen:	Increasing demands for energy eiciency constrain emerging hardware. These new hardware trends challenge the established assumptions in code generation and force us to rethink existing software optimization techniques. We propose a cross-layer redesign of the way compilers and the underlying microarchitecture are built and interact, to achieve both performance and high energy eiciency. In this paper, we address one of the main performance bottlenecksÐlast-level cache missesÐthrough a softwarehardware co-design. Our approach is able to hide memory latency and attain increased memory and instruction level parallelism by orchestrating a non-speculative, execute-ahead paradigm in software (SWOOP). While out-of-order (OoO) architectures attempt to hide memory latency by dynamically reordering instructions, they do so through expensive, power-hungry, speculative mechanisms.We aim to shift this complexity into software, and we build upon compilation techniques inherited from VLIW, software pipelining, modulo scheduling, decoupled access-execution, and software prefetching. In contrast to previous approaches we do not rely on either software or hardware speculation that can be detrimental to eiciency. Our SWOOP compiler is enhanced with lightweight architectural support, thus being able to transform applications that include highly complex control-low and indirect memory accesses. The efectiveness of our software-hardware co-design is proven on the most limited but energy-eicient microarchitectures, non-speculative, in-order execution (InO) cores, which rely entirely on compile-time instruction scheduling. We show that (1) our approach achieves its goal in hiding the latency of the last-level cache misses and improves performance by 34% and energy eiciency by 23% over the baseline InO core, competitive with an oracle InO core with a perfect last-level cache; (2) can even exceed the performance of the oracle core, by exposing a higher degree of memory and instruction level parallelism. Moreover, we compare to a modest speculative OoO core, which hides not only the latency of last-level cache misses, but most instruction latency, and conclude that while the OoO core is still 39% faster than SWOOP, it pays a steep price for this advantage by doubling the energy consumption.
Autor/es principal/es:	Tran, Kim-Anh Jimborean, Alexandra Carlson, Trevor E. Koukos, Konstantinos Själander, Magnus Kaxiras, Stefanos
Forma parte de:	Proceedings of 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’18). ACM, NewYork, NY, USA. Pages 328–343
URI:	http://hdl.handle.net/10201/138972
DOI:	https://dl.acm.org/doi/10.1145/3192366.3192393
Tipo de documento:	info:eu-repo/semantics/article
Número páginas / Extensión:	16
Derechos:	info:eu-repo/semantics/embargoedAccess
Descripción:	© 2018 Association for Computing Machinery. This document is the published version of a published work that appeared in final form in PLDI 2018: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. To access the final work, see DOI: https://doi.org/10.1145/3192366.3192393
Aparece en las colecciones:	Artículos

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
SWOOP.pdf	SWOOP	665,45 kB	Adobe PDF	Visualizar/Abrir Solicitar una copia

Mostrar el registro Dublin Core completo del ítem Mostrar el registro PREMIS del ítem Estadísticas