Por favor, use este identificador para citar o enlazar este ítem: https://doi.org/10.1145/3613424.3614296

Título: Architectural Support for Optimizing Huge Page Selection Within the OS
Fecha de publicación: 30-oct-2023
ISBN: 979-8-4007-0329-4/23/10
Palabras clave: Hardware-software co-design
Cache architectures
Memory management
Virtual memory
Operating systems
Graph processing
Resumen: Irregular, memory-intensive applications often incur high translation lookaside buffer (TLB) miss rates that result in significant address translation overheads. Employing huge pages is an effective way to reduce these overheads, however in real systems the number of available huge pages can be limited when system memory is nearly full and/or fragmented. Thus, huge pages must be used selectively to back application memory. This work demonstrates that choosing memory regions that incur the most TLB misses for huge page promotion best reduces address translation overheads. We call these regions High reUse TLB-sensitive data (HUBs). Unlike prior work which relies on expensive per-page software counters to identify promotion regions, we propose new architectural support to identify these regions dynamically at application runtime. We propose a promotion candidate cache (PCC) that identifies HUB candidates based on hardware page table walks after a lastlevel TLB miss. This small, fixed-size structure tracks huge pagealigned regions (consisting of 𝑁 base pages), ranks them based on observed page table walk frequency, and only keeps the most frequently accessed ones. Evaluated on applications of various memory intensity, our approach successfully identifies application pages incurring the highest address translation overheads. Our approach demonstrates that with the help of a PCC, the OS only needs to promote 4% of the application footprint to achieve more than 75% of the peak achievable performance, yielding 1.19-1.33× speedups over 4KB base pages alone. In real systems where memory is typically fragmented, the PCC outperforms Linux’s page promotion policy by 14% (when 50% of total memory is fragmented) and 16% (when 90% of total memory is fragmented) respectively.
Autor/es principal/es: Manocha, A.
Yan, Z.
Tureci, E.
Aragón, J.L.
Nellans, D.
Martonosi, M.
Facultad/Departamentos/Servicios: Facultades, Departamentos, Servicios y Escuelas::Departamentos de la UMU::Ingeniería y Tecnología de Computadores
Forma parte de: 56th ACM/IEEE International Symposium on Microarchitecture (MICRO), Toronto, Canada, pp. 1-14, ISBN: 979-8-4007-0329-4
URI: http://hdl.handle.net/10201/135883
DOI: https://doi.org/10.1145/3613424.3614296
Tipo de documento: info:eu-repo/semantics/lecture
info:eu-repo/semantics/lecture
Número páginas / Extensión: 14
Derechos: info:eu-repo/semantics/openAccess
Descripción: © 2023 Copyright held by the owner/author(s). This document is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/ This document is the Accepted version of a Published Work that appeared in final form in 56th ACM/IEEE International Symposium on Microarchitecture (MICRO), Toronto, Canada. To access the final edited and published work see https://doi.org/10.1145/3613424.3614296
Aparece en las colecciones:Artículos: Ingeniería y Tecnología de Computadores

Ficheros en este ítem:
Fichero Descripción TamañoFormato 
PCC-micro23-camera-ready.pdf1,02 MBAdobe PDFVista previa
Visualizar/Abrir


Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons Creative Commons