Abstract
Performance counters, also known as hardware counters, are a powerful monitoring mechanism included in the Performance Monitoring Unit (PMU) of most of the modern microprocessors. Their use is gaining popularity as an analysis and validation tool for profiling, since their impact is virtually imperceptible and their precision has noticeably increased thanks to the new Precise Event-Based Sampling (PEBS) features.
In this paper, we present and evaluate a novel user-level tool, based on hardware counters, for monitoring and migrating pages dynamically. This tool supports different migration strategies, being able to attach and monitor a target application without need to modify it whatsoever. The page migration process is performed timely and its overhead is overcome by the benefit of the data locality achieved.
As a case study, an access-based migration algorithm was implemented and integrated into our tool. Performance results on a NUMA system show a noticeable reduction of remote accesses and execution time, achieving speedups of up to ∼21 % in a multiprogrammed environment.
Similar content being viewed by others
References
Bolosky WJ, Scott ML, Fitzgerald RP, Fowler RJ, Cox AL (1991) NUMA policies and their relation to memory architecture. In: Int conf on architectural support for programming languages and operating systems, pp 212–221
Bull JM, Johnson C (2002) Data distribution, migration and replication on a ccNUMA architecture. In: Proceedings of the fourth European workshop on OpenMP
Eranian S (2005) The Perfmon2 interface specification. Technical report HPL-2004-200R1, HP Labs
Galicia supercomputing centre (CESGA): http://www.cesga.es
Goglin B, Furmento N (2009) Enabling high-performance memory migration for multithreaded applications on Linux. In: Proc of the IEEE int symposium on parallel & distributed processing, pp 1–9
Hewlett Packard (2006) Dual-core update to the Intel Itanium 2 processor reference manual. Technical paper
Jin H, Jin H, Frumkin M, Frumkin M, Yan J, Yan J (1999) The OpenMP implementation of NAS parallel benchmarks and its performance. Technical report
Larowe RP Jr, Schlatter Ellis C (1991) Experimental comparison of memory management policies for NUMA multiprocessors. ACM Trans Comput Syst 9(4):319–363
Majo Z, Gross TR (2012) Matching memory access patterns and data placement for NUMA systems. In: Proc of the tenth international symposium on code generation and optimization, CGO’12, New York, NY, USA, pp 230–241
Marathe J, Mueller F (2006) Hardware profile-guided automatic page placement for ccNUMA systems. In: Proc of the ACM SIGPLAN symposium on principles and practice of parallel programming, pp 90–99
move_pages manual: http://linux.die.net/man/2/move_pages
Nikolopoulos DS, Papatheodorou TS, Polychronopoulos CD, Labarta J, Ayguadé E (2000) A case for user-level dynamic page migration. In: Proceedings of the int conf on supercomputing, pp 119–130
Nikolopoulos DS, Papatheodorou TS, Polychronopoulos CD, Labarta J, Ayguadé E (2000) User-level dynamic page migration for multiprogrammed shared-memory multiprocessors. In: Proc of the int conf on parallel processing, p 95
Nikolopoulos DS, Polychronopoulos CD, Papatheodorou TS, Labarta J, Ayguadé E (2002) Scheduler-activated dynamic page migration for multiprogrammed DSM multiprocessors. J Parallel Distrib Comput 62(6):1069–1103
OpenMP: Simple, portable, scalable SMP programming. http://openmp.org
Perfmon2 monitoring interface and Pfmon monitoring tool: http://perfmon2.sourceforge.net
Tao J, Schulz M, Karl W (2002) Improving data locality using dynamic page migration based on memory access histograms. In: Proc of the international conference on computational science—Part II, pp 933–942
Thakkar V (2008) Dynamic page migration on ccNUMA platforms guided by hardware tracing. Master’s thesis, Graduate Faculty of North Carolina State University
Tikir MM, Hollingsworth JK (2004) Using hardware counters to automatically improve memory performance. In: Proc of the ACM/IEEE conference on supercomputing, SC’04, p 46
Tikir MM, Hollingsworth JK (2008) Hardware monitors for dynamic page migration. J Parallel Distrib Comput 68:1186–1200
Wang X, Wen X, Li Y, Luo Y, Li X, Wang Z (2012) A dynamic cache partitioning mechanism under virtualization environment. In: Proc of the 11th international conf on trust, security and privacy in computing and communications (TrustCom), pp 1907–1911
Wilson KM, Aglietti BB (2001) Dynamic page placement to improve locality in CC-NUMA multiprocessors for TPC-C. In: Proceedings of the ACM/IEEE conference on supercomputing, pp 98–107
Acknowledgements
This work has been partially supported by Hewlett-Packard under contract 2008/CE377, by the Ministry of Education and Science of Spain, FEDER funds under contract TIN 2010-17541 and by the Xunta de Galicia (Spain) under contract 2010/28 and project 09TIC002CT. This work is in the frame of the Spanish network CAPAP-H. The authors also wish to thank the supercomputer facilities provided by CESGA.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lorenzo-Castillo, J.A., Pichel, J.C., Rivera, F.F. et al. A flexible and dynamic page migration infrastructure based on hardware counters. J Supercomput 65, 930–948 (2013). https://doi.org/10.1007/s11227-013-0872-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-013-0872-4