Tiny but Mighty: Designing and Realizing Scalable Latency Tolerance for Manycore SoCs

Orenes-Vera, M.; Manocha, A.; Balkind, J.; Gao, F.; Aragón, J.L.; Wentzlaff, D.; Martonosi, M.

Por favor, use este identificador para citar o enlazar este ítem: https://doi.org/10.1145/3470496.3527400

RefMan EndNote BibTex RefWorks Excel CSV PDF Mendeley

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.author	Orenes-Vera, M.	-
dc.contributor.author	Manocha, A.	-
dc.contributor.author	Balkind, J.	-
dc.contributor.author	Gao, F.	-
dc.contributor.author	Aragón, J.L.	-
dc.contributor.author	Wentzlaff, D.	-
dc.contributor.author	Martonosi, M.	-
dc.date.accessioned	2024-01-31T17:08:01Z	-
dc.date.available	2024-01-31T17:08:01Z	-
dc.date.issued	2022-06-18	-
dc.identifier.citation	Proc. of the 49th IEEE/ACM International Symposium on Computer Architecture (ISCA), New York, NY, USA, pp. 817-830, ISBN: 978-1-4503-8610-4, Junio 2022	es
dc.identifier.isbn	978-1-4503-8610-4	-
dc.identifier.uri	http://hdl.handle.net/10201/138304	-
dc.description.abstract	Modern computing systems employ significant heterogeneity and specialization to meet performance targets at manageable power. However, memory latency bottlenecks remain problematic, particularly for sparse neural network and graph analytic applications where indirect memory accesses (IMAs) challenge the memory hierarchy. Decades of prior art have proposed hardware and software mechanisms to mitigate IMA latency, but they fail to analyze real-chip considerations, especially when used in SoCs and manycores. In this paper, we revisit many of these techniques while taking into account manycore integration and verification. We present the first system implementation of latency tolerance hardware that provides significant speedups without requiring any memory hierarchy or processor tile modifications. This is achieved through a Memory Access Parallel-Load Engine (MAPLE), integrated through the Network-on-Chip (NoC) in a scalable manner. Our hardware-software co-design allows programs to perform long-latency memory accesses asynchronously from the core, avoiding pipeline stalls, and enabling greater memory parallelism (MLP). In April 2021 we taped out a manycore chip that includes tens of MAPLE instances for efficient data supply. MAPLE demonstrates a full RTL implementation of out-of-core latency-mitigation hardware, with virtual memory support and automated compilation targetting it. This paper evaluates MAPLE integrated with a dual-core FPGA prototype running applications with full SMP Linux, and demonstrates geomean speedups of 2.35× and 2.27× over software-based prefetching and decoupling, respectively. Compared to state-of-the-art hardware, it provides geomean speedups of 1.82× and 1.72× over prefetching and decoupling techniques.	es
dc.format	application/pdf	es
dc.format.extent	14	es
dc.language	eng	es
dc.publisher	ACM and IEEE	es
dc.relation	TÍTULO PROYECTO: "Diseño de un sistema de memoria de alto rendimiento para aplicaciones emergentes de análisis masivo de datos" Código: 21508/EE/21 Organismo financiador: Fundación Séneca-Agencia de Ciencia y Tecnología, Región de Murcia, Programa Jiménez de la Espada TÍTULO PROYECTO: "DECADES: Deeply-Customized Accelerator-Oriented Data Supply Systems Synthesis" Código: FA8650-18-2-7862 Organismo financiador: Defense Advanced Research Projects Agency (DARPA); Programa: Software Defined Hardware (SDH); País: E.E.U.U. TÍTULO PROYECTO: "OpenPiton 2: Enabling Open Source Manycore Hardware Research" Código: NFS Grant ID CNS-1823222 Organismo financiador: US National Science Foundation (NFS); Programa: Community Infrastructure for Research (CRI); País: E.E.U.U.	es
dc.relation.ispartof	49th IEEE/ACM International Symposium on Computer Architecture (ISCA), New York, NY, USA, Junio 2022	es
dc.relation.requires	https://doi.org/10.1145/3470496.3527400	es
dc.rights	info:eu-repo/semantics/openAccess	es
dc.rights	Atribución 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	*
dc.subject	Computer systems organization	es
dc.subject	Multicore architectures	es
dc.subject	Reconfigurable computing	es
dc.subject	Heterogeneous systems	es
dc.subject	Memory	es
dc.subject	Latency tolerance	es
dc.subject	Decoupling	es
dc.subject	Modular RTL	es
dc.title	Tiny but Mighty: Designing and Realizing Scalable Latency Tolerance for Manycore SoCs	es
dc.type	info:eu-repo/semantics/lecture	es
dc.type	info:eu-repo/semantics/lecture	es
dc.identifier.doi	https://doi.org/10.1145/3470496.3527400	-
dc.contributor.department	Ingeniería y Tecnología de Computadores	-
Aparece en las colecciones:	Artículos

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
MAPLE-ISCA2022-final.pdf	versión editor	1,33 MB	Adobe PDF	Visualizar/Abrir

Mostrar el registro sencillo del ítem Mostrar el registro PREMIS del ítem Estadísticas

Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons