Browsing by Subject "Autotuning "
Now showing 1 - 7 of 7
Results Per Page
Sort Options
- PublicationOpen AccessAn autotuning approach to select the inter-GPU communication library on heterogeneous systems(Springer, 2024-12-12) Cámara, Jesús; Cuenca Muñoz, Antonio Javier; Cuenca, Javier; Boratto, Murilo; Vicente Jaén, Arturo; Galindo Garre, Víctor; Ingeniería y Tecnología de ComputadoresIn this work, an automatic optimisation approach for parallel routines on multi-GPU systems is presented. Several inter-GPU communication libraries (such as CUDA- Aware MPI or NCCL) are used with a set of routines to perform the numerical oper- ations among the GPUs located on the compute nodes. The main objective is the selection of the most appropriate communication library, the number of GPUs to be used and the workload to be distributed among them in order to reduce the cost of data movements, which represent a large percentage of the total execution time. To this end, a hierarchical modelling of the execution time of each routine to be opti- mised is proposed, combining experimental and theoretical approaches. The results show that near-optimal decisions are taken in all the scenarios analysed.
- PublicationOpen AccessHATLib. Software para la instalacion y optimizacion jerarquica de rutinas de algebra lineal en sistemas heterogenos(2025) Cámara, Jesús; Cuenca Muñoz, Antonio Javier; Giménez, Domingo; Ingeniería y Tecnología de ComputadoresEste software permite realizar la instalacion jerarquica de rutinas de algebra lineal, auto-optimizando su ejecucion conforme se avanza en los diferentes niveles hardware (agrupacion de unidades de computo) y software (jerarqua de rutinas). Durante el proceso de instalacion, por un lado, se determinan los valores de los parametros algormicos en el nivel actual y, por otro, se aplica una metodologa de optimizacion que permite, mediante el uso de la informacion de instalacion almacenada en niveles inferiores de la jerarqua, ejecutar de forma e ciente la rutina que esta siendo instalada. De esta forma se consigue, a su vez, reducir el tiempo de instalacion. El proceso de instalacion se ha de llevar a cabo comenzando siempre desde el nivel mas bajo de la jerarqua. En la dimension hardware, este nivel corresponde a las unidades basicas de procesamiento (CPU, GPUs y/o MICs) presentes en los nodos de computo. El siguiente nivel (nivel 1) correspondera al nodo de computo en su totalidad o cualquier subconjunto de unidades de computo del mismo (nodos virtuales). El ultimo nivel (nivel 2), equivaldra a usar la plataforma completa o un subconjunto de nodos (virtuales o no) de la misma, es decir, cualquier agrupacion de unidades de computo de la plataforma, donde cada una puede estar formada, a su vez, por un subconjunto de unidades de computo. En la dimension software, en cambio, esta version inicial solo permite la instalacion de la rutina de multiplicacion de matrices en los niveles hardware mencionados. Se ha considerado inicialmente esta rutina porque constituye el kernel computacional basico de la gran mayora de rutinas de algebra lineal numerica. Una vez instalada la rutina, la aplicacion ofrece otras rutinas de nivel superior, como la multiplicacion de Strassen o la factorizacion LU, que se pueden ejecutar en diferentes niveles usando internamente la rutina auto-optimizada de multiplicacion de matrices. En siguientes versiones se extendera su funcionalidad para permitir instalar rutinas de forma optimizada en cualquier nivel hardware de la plataforma, haciendo uso de una jerarqua de niveles de rutinas similar a la establecida en libreras de algebra lineal como BLAS y LAPACK.
- PublicationEmbargoImproving the performance of task-based linear algebra software with autotuning techniques on heterogeneous architectures(Springer Nature, 2023) Cámara, Jesús; Cuenca Muñoz, Antonio Javier; Boratto, Murilo; Ingeniería y Tecnología de ComputadoresThis work presents several self-optimization strategies to improve the performance of task-based linear algebra software on heterogeneous systems. The study focuses on Chameleon, a task-based dense linear algebra software whose routines are computed using a tile-based algorithmic scheme and executed in the available computing resources of the system using a scheduler which dynamically handles data dependencies among the basic computational kernels of each linear algebra routine. The proposed strategies are applied to select the best values for the parameters that affect the performance of the routines, such as the tile size or the scheduling policy, among others. Also, parallel optimized implementations provided by existing linear algebra libraries, such as Intel MKL (on multicore CPU) or cuBLAS (on GPU) are used to execute each of the computational kernels of the routines. Results obtained on a heterogeneous system composed of several multicore and multiGPU are satisfactory, with performances close to the experimental optimum.
- PublicationEmbargoIntegrating software and hardware hierarchies in an autotuning method for parallel routines in heterogeneous clusters(Springer , 2020-03-07) Cámara, Jesús; Cuenca Muñoz, Antonio Javier; Giménez, Domingo; Ingeniería y Tecnología de ComputadoresA hierarchical approach for autotuning linear algebra routines on heterogeneous platforms is presented. Hierarchy helps to alleviate the difficulties of tuning parallel routines for high-performance computing systems. This paper analyzes the application of the hierarchical approach at both the hardware and software levels, using the basic matrix multiplication and the Strassen multiplication as proof of concept on multicore+coprocessor nodes. In this way, the hierarchical approach allows partial delegation of the efficient exploitation of the computing units in the node to the underlying direct autotuned matrix multiplication used in the base case.
- PublicationOpen AccessOn the autotuning of task-based numerical libraries for heterogeneous architectures(IOS Press, 2020) Agullo, Emmanuel; Cámara, Jesús; Cuenca Muñoz, Antonio Javier; Giménez, Domingo; Ingeniería y Tecnología de Computadores; Facultad de InformáticaA roadmap for autotuning task-based numerical libraries is presented. Carefully chosen experiments are carried out when the numerical library is being installed to assess its performance. Real and simulated executions are considered to optimize the routine. The discussion is illustrated with a task-based tile Cholesky factorization, and the aim is to find the optimum tile size for any problem size, using the Chameleon numerical linear algebra package on top of the StarPU runtime system and also with the SimGrid simulator. The study shows that combining a smart exploration strategy of the search space with both real and simulated executions results in a fast, reliable autotuning process.
- PublicationOpen AccessPARCSIM: a parallel computing simulator for scalable software optimization(2025) Cano, José Carlos; Cámara, Jesús; Cuenca Muñoz, Antonio Javier; Giménez, Domingo; Saura Sánchez, Mariano; Ingeniería y Tecnología de ComputadoresPARCSIM is a parallel software simulator that allows a user to capture, through a graphical interface, matrix algorithm schemes that solve scientific problems. With this tool, the user can analyse the execution times that would be obtained by using different spatio-temporal mapping of computational tasks on available computational units, parallelism parameters and computational libraries. Furthermore, for complex problem models, the self-optimization engine incorporated in this tool analyses the huge tree of possible calculations grouping and mapping strategies in search of the choice that makes the best use of the available hardware resources. This tool also offers polyalgorithmic resolution by making automatically the best decision between different software approaches to solve a given problem on the hardware system available. This work shows the usefulness of this simulator to efficiently solve hierarchical problems constructed from previously modelled subproblems. This task is performed by reusing, in a scalable way, the optimization information of these subproblems to establish the best execution configuration for the composite problem.
- PublicationOpen AccessPARCSIM: a parallel computing simulator for scalable software optimization(Springer, 2022-05-16) Cámara, Jesús; Cano, José Carlos; Cuenca Muñoz, Antonio Javier; Saura Sánchez, Mariano; Ingeniería y Tecnología de ComputadoresPARCSIM is a parallel software simulator that allows a user to capture, through a graphical interface, matrix algorithm schemes that solve scientific problems. With this tool, the user can analyse the execution times that would be obtained by using different spatio-temporal mapping of computational tasks on available computational units, parallelism parameters and computational libraries. Furthermore, for complex problem models, the self-optimization engine incorporated in this tool analyses the huge tree of possible calculations grouping and mapping strategies in search of the choice that makes the best use of the available hardware resources. This tool also offers polyalgorithmic resolution by making automatically the best decision between different software approaches to solve a given problem on the hardware system available. This work shows the usefulness of this simulator to efficiently solve hierarchical problems constructed from previously modelled subproblems. This task is performed by reusing, in a scalable way, the optimization information of these subproblems to establish the best execution configuration for the composite problem.
