Por favor, use este identificador para citar o enlazar este ítem:
http://hdl.handle.net/10201/150921


Registro completo de metadatos
Campo DC | Valor | Lengua/Idioma |
---|---|---|
dc.contributor.author | Meseguer, Nicolás | - |
dc.contributor.author | Sun, Yifan | - |
dc.contributor.author | Pellauer, Michael | - |
dc.contributor.author | Abellán, José L | - |
dc.contributor.author | Acacio, Manuel E. | - |
dc.date.accessioned | 2025-02-24T11:33:08Z | - |
dc.date.available | 2025-02-24T11:33:08Z | - |
dc.date.issued | 2025-03 | - |
dc.identifier.uri | http://hdl.handle.net/10201/150921 | - |
dc.description.abstract | Achieving peak GPU performance requires optimizing data locality and asynchronous execution to minimize memory access costs and overlap computation with transfers. While features like the Tensor Memory Accelerator (TMA) and warp specialization address these challenges, their complexity often limits programmers. In this work, we present ACTA (Automatic Configuration of the Tensor Memory Accelerator), a software library that simplifies and optimizes TMA usage. By leveraging the GPU Specification Table (GST), ACTA dynamically determines the optimal tile sizes and queue configurations for each kernel and architecture. Its algo- rithm ensures efficient overlap between memory and computation, drastically reducing programming complexity and eliminating the need for exhaustive design space exploration. Our evaluation across a diverse set of GPU kernels demonstrates that ACTA achieves performance within 2.78% of exhaustive tun-ing while requiring only a single configuration pass. This makes ACTA a practical and efficient solution for optimizing modern GPU workloads, combining near-optimal performance with significantly reduced programming effort. | - |
dc.format | application/pdf | es |
dc.language | eng | es |
dc.publisher | Association for Computing Machinery (ACM) | es |
dc.relation | This work has been funded by the MCIN/AEI/10.13039/501100011033/ and the "ERDF A way of making Europe”, EU, under grant PID2022-136315OB-I00; by MICIU/AEI/10.13039/501100011033 and the "European Union NextGenerationEU/PRTR", under grant RYC2021-031966-I; and partially supported by NSF (US) under award 2246035 and 2402804. Nicolás Meseguer is supported by fellowship 21803/FPI/22 from Fundación Séneca, Agencia Regional de Ciencia y Tecnología de la Region de Murcia. | es |
dc.relation.ispartof | 16th Int'l Workshop on General Purpose Processing Using GPU (GPGPU '25) | es |
dc.rights | info:eu-repo/semantics/openAccess | es |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.title | ACTA: Automatic Configuration of the Tensor Memory Accelerator for High-End GPUs | es |
dc.type | info:eu-repo/semantics/preprint | es |
dc.contributor.department | Departamento de Ingeniería y Tecnología de Computadores | - |
Aparece en las colecciones: | Artículos |
Ficheros en este ítem:
Fichero | Descripción | Tamaño | Formato | |
---|---|---|---|---|
preprint_tma_gpgpu_2025.pdf | 669,13 kB | Adobe PDF | ![]() Visualizar/Abrir |
Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons