ACTA: Automatic Configuration of the Tensor Memory Accelerator for High-End GPUs

Meseguer, Nicolás; Sun, Yifan; Pellauer, Michael; Abellán, José L; Acacio, Manuel E.

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10201/150921

RefMan EndNote BibTex RefWorks Excel CSV PDF Mendeley

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.author	Meseguer, Nicolás	-
dc.contributor.author	Sun, Yifan	-
dc.contributor.author	Pellauer, Michael	-
dc.contributor.author	Abellán, José L	-
dc.contributor.author	Acacio, Manuel E.	-
dc.date.accessioned	2025-02-24T11:33:08Z	-
dc.date.available	2025-02-24T11:33:08Z	-
dc.date.issued	2025-03	-
dc.identifier.uri	http://hdl.handle.net/10201/150921	-
dc.description.abstract	Achieving peak GPU performance requires optimizing data locality and asynchronous execution to minimize memory access costs and overlap computation with transfers. While features like the Tensor Memory Accelerator (TMA) and warp specialization address these challenges, their complexity often limits programmers. In this work, we present ACTA (Automatic Configuration of the Tensor Memory Accelerator), a software library that simplifies and optimizes TMA usage. By leveraging the GPU Specification Table (GST), ACTA dynamically determines the optimal tile sizes and queue configurations for each kernel and architecture. Its algo- rithm ensures efficient overlap between memory and computation, drastically reducing programming complexity and eliminating the need for exhaustive design space exploration. Our evaluation across a diverse set of GPU kernels demonstrates that ACTA achieves performance within 2.78% of exhaustive tun-ing while requiring only a single configuration pass. This makes ACTA a practical and efficient solution for optimizing modern GPU workloads, combining near-optimal performance with significantly reduced programming effort.	-
dc.format	application/pdf	es
dc.language	eng	es
dc.publisher	Association for Computing Machinery (ACM)	es
dc.relation	This work has been funded by the MCIN/AEI/10.13039/501100011033/ and the "ERDF A way of making Europe”, EU, under grant PID2022-136315OB-I00; by MICIU/AEI/10.13039/501100011033 and the "European Union NextGenerationEU/PRTR", under grant RYC2021-031966-I; and partially supported by NSF (US) under award 2246035 and 2402804. Nicolás Meseguer is supported by fellowship 21803/FPI/22 from Fundación Séneca, Agencia Regional de Ciencia y Tecnología de la Region de Murcia.	es
dc.relation.ispartof	16th Int'l Workshop on General Purpose Processing Using GPU (GPGPU '25)	es
dc.rights	info:eu-repo/semantics/openAccess	es
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.title	ACTA: Automatic Configuration of the Tensor Memory Accelerator for High-End GPUs	es
dc.type	info:eu-repo/semantics/preprint	es
dc.contributor.department	Departamento de Ingeniería y Tecnología de Computadores	-
Aparece en las colecciones:	Artículos

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
preprint_tma_gpgpu_2025.pdf		669,13 kB	Adobe PDF	Visualizar/Abrir

Mostrar el registro sencillo del ítem Mostrar el registro PREMIS del ítem Estadísticas

Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons