Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10201/150921

Registro completo de metadatos
Campo DCValorLengua/Idioma
dc.contributor.authorMeseguer, Nicolás-
dc.contributor.authorSun, Yifan-
dc.contributor.authorPellauer, Michael-
dc.contributor.authorAbellán, José L-
dc.contributor.authorAcacio, Manuel E.-
dc.date.accessioned2025-02-24T11:33:08Z-
dc.date.available2025-02-24T11:33:08Z-
dc.date.issued2025-03-
dc.identifier.urihttp://hdl.handle.net/10201/150921-
dc.description.abstractAchieving peak GPU performance requires optimizing data locality and asynchronous execution to minimize memory access costs and overlap computation with transfers. While features like the Tensor Memory Accelerator (TMA) and warp specialization address these challenges, their complexity often limits programmers. In this work, we present ACTA (Automatic Configuration of the Tensor Memory Accelerator), a software library that simplifies and optimizes TMA usage. By leveraging the GPU Specification Table (GST), ACTA dynamically determines the optimal tile sizes and queue configurations for each kernel and architecture. Its algo- rithm ensures efficient overlap between memory and computation, drastically reducing programming complexity and eliminating the need for exhaustive design space exploration. Our evaluation across a diverse set of GPU kernels demonstrates that ACTA achieves performance within 2.78% of exhaustive tun-ing while requiring only a single configuration pass. This makes ACTA a practical and efficient solution for optimizing modern GPU workloads, combining near-optimal performance with significantly reduced programming effort.-
dc.formatapplication/pdfes
dc.languageenges
dc.publisherAssociation for Computing Machinery (ACM)es
dc.relationThis work has been funded by the MCIN/AEI/10.13039/501100011033/ and the "ERDF A way of making Europe”, EU, under grant PID2022-136315OB-I00; by MICIU/AEI/10.13039/501100011033 and the "European Union NextGenerationEU/PRTR", under grant RYC2021-031966-I; and partially supported by NSF (US) under award 2246035 and 2402804. Nicolás Meseguer is supported by fellowship 21803/FPI/22 from Fundación Séneca, Agencia Regional de Ciencia y Tecnología de la Region de Murcia.es
dc.relation.ispartof16th Int'l Workshop on General Purpose Processing Using GPU (GPGPU '25)es
dc.rightsinfo:eu-repo/semantics/openAccesses
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.titleACTA: Automatic Configuration of the Tensor Memory Accelerator for High-End GPUses
dc.typeinfo:eu-repo/semantics/preprintes
dc.contributor.departmentDepartamento de Ingeniería y Tecnología de Computadores-
Aparece en las colecciones:Artículos

Ficheros en este ítem:
Fichero Descripción TamañoFormato 
preprint_tma_gpgpu_2025.pdf669,13 kBAdobe PDFVista previa
Visualizar/Abrir


Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons Creative Commons