iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://unpaywall.org/10.1145/2485922.2485964
GPUWattch | Proceedings of the 40th Annual International Symposium on Computer Architecture skip to main content
10.1145/2485922.2485964acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

GPUWattch: enabling energy optimizations in GPGPUs

Published: 23 June 2013 Publication History

Abstract

General-purpose GPUs (GPGPUs) are becoming prevalent in mainstream computing, and performance per watt has emerged as a more crucial evaluation metric than peak performance. As such, GPU architects require robust tools that will enable them to quickly explore new ways to optimize GPGPUs for energy efficiency. We propose a new GPGPU power model that is configurable, capable of cycle-level calculations, and carefully validated against real hardware measurements. To achieve configurability, we use a bottom-up methodology and abstract parameters from the microarchitectural components as the model's inputs. We developed a rigorous suite of 80 microbenchmarks that we use to bound any modeling uncertainties and inaccuracies. The power model is comprehensively validated against measurements of two commercially available GPUs, and the measured error is within 9.9% and 13.4% for the two target GPUs (GTX 480 and Quadro FX5600). The model also accurately tracks the power consumption trend over time. We integrated the power model with the cycle-level simulator GPGPU-Sim and demonstrate the energy savings by utilizing dynamic voltage and frequency scaling (DVFS) and clock gating. Traditional DVFS reduces GPU energy consumption by 14.4% by leveraging within-kernel runtime variations. More finer-grained SM cluster-level DVFS improves the energy savings from 6.6% to 13.6% for those benchmarks that show clustered execution behavior. We also show that clock gating inactive lanes during divergence reduces dynamic power by 11.2%.

References

[1]
MacSim, http://code.google.com/p/macsim.
[2]
Predictive technology model, http://ptm.asu.edu.
[3]
Synopsys Inc., Power Compiler, www.synopsys.com.
[4]
A. Bakhoda et al. Analyzing CUDA workloads using a detailed GPU simulator. In ISPASS, 2009.
[5]
M. Bauer et al. CudaDMA: optimizing GPU memory bandwidth via warp specialization. In SC, 2011.
[6]
D. Brooks et al. Wattch: a framework for architectural-level power analysis and optimizations. In ISCA, 2000.
[7]
S. Che et al. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, 2009.
[8]
S. Collange et al. Power consumption of GPUs from a software perspective. In ICCS, 2009.
[9]
W. J. Dally. Moving the needle, computer architecture research in academe and industry. In ISCA, 2010.
[10]
J. M. V. Dyke et al. Graphics system with virtual memory pages and non-power of two number of memory elements, 2011.
[11]
W. Fung and T. Aamodt. Thread block compaction for efficient SIMT control flow. In HPCA, 2011.
[12]
W. Fung et al. Dynamic warp formation and scheduling for efficient GPU control flow. In MICRO, 2007.
[13]
S. Hong and H. Kim. An integrated GPU power and performance model. In ISCA, 2010.
[14]
C. Isci et al. Live, runtime phase monitoring and prediction on real systems with application to dynamic power management. In MICRO, 2006.
[15]
H. Jacobson et al. Stretching the limits of clock-gating efficiency in server-class processors. In HPCA, 2005.
[16]
T. Kailath, A. Sayed, and B. Hassibi. Linear Estimation. Prentice Hall, 2000.
[17]
K. Kasichayanula et al. Power aware computing on GPUs. SAAHPC, 2012.
[18]
S. Keckler. Life After Dennard and How I Learned to Love the Picojoule. In MICRO, 2012.
[19]
W. Kim et al. System level analysis of fast, per-core DVFS using on-chip switching regulators. In HPCA, 2008.
[20]
J. Lee et al. Improving throughput of power-constrained GPUs using dynamic voltage/frequency and core scaling. In PACT, 2011.
[21]
H. Li et al. Deterministic clock gating for microprocessor power reduction. In HPCA, 2003.
[22]
S. Li et al. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In MICRO, 2009.
[23]
E. Lindholm et al. NVIDIA Tesla: A unified graphics and computing architecture. Micro, IEEE, 2008.
[24]
J. E. Lindholm et al. Simulating multiported memories using lower port count memories, 2008.
[25]
S. Liu et al. Operand collector architecture, 2010.
[26]
H. Nagasaka et al. Statistical power modeling of GPU kernels using performance counters. In Green Computing Conference, 2010.
[27]
V. Narasiman et al. Improving GPU performance via large warps and two-level warp scheduling. In MICRO, 2011.
[28]
NVIDIA. Fermi Compute Architecture Whitepaper, 2009.
[29]
NVIDIA. Compute Visual Profiler - User Guide, Version 4, 2011.
[30]
NVIDIA. NVIDIA CUDA C Programming Guide, 2012.
[31]
H.-J. Oh et al. A fully pipelined single-precision floating-point unit in the synergistic processor element of a CELL processor. JSSC, 2006.
[32]
V. Sathish et al. Lossless and lossy memory-link compression techniques for improving performance of memory-bound GPGPU workloads. In PACT, 2012.
[33]
S. Thoziyoor et al. A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. In ISCA, 2008.
[34]
R. Ubal et al. Multi2Sim: A simulation framework for CPU-GPU computing. In PACT, 2012.
[35]
T. Vogelsang. Understanding the energy consumption of dynamic random access memories. In MICRO, 2010.
[36]
H. Wang and Q. Chen. Power estimating model and analysis of general programming on GPU. Journal of Software, 2012.
[37]
Q. Wu et al. A dynamic compilation framework for controlling microprocessor energy and performance. In MICRO, 2005.
[38]
Y. Zhang et al. Performance and power analysis of ATI GPU: A statistical approach. In NSA, 2011.

Cited By

View all
  • (2024)VESTA: Power Modeling with Language Runtime EventsProceedings of the ACM on Programming Languages10.1145/36564028:PLDI(621-646)Online publication date: 20-Jun-2024
  • (2024)SERENUS: Alleviating Low-Battery Anxiety Through Real-time, Accurate, and User-Friendly Energy Consumption Prediction of Mobile ApplicationsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676437(1-20)Online publication date: 13-Oct-2024
  • (2024)Cross-Core Data Sharing for Energy-Efficient GPUsACM Transactions on Architecture and Code Optimization10.1145/3653019Online publication date: 18-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
June 2013
686 pages
ISBN:9781450320795
DOI:10.1145/2485922
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 41, Issue 3
    ICSA '13
    June 2013
    666 pages
    ISSN:0163-5964
    DOI:10.1145/2508148
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • IEEE CS

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CUDA
  2. GPU architecture
  3. energy
  4. power
  5. power estimation

Qualifiers

  • Research-article

Funding Sources

Conference

ISCA'13
Sponsor:

Acceptance Rates

ISCA '13 Paper Acceptance Rate 56 of 288 submissions, 19%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)286
  • Downloads (Last 6 weeks)48
Reflects downloads up to 07 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)VESTA: Power Modeling with Language Runtime EventsProceedings of the ACM on Programming Languages10.1145/36564028:PLDI(621-646)Online publication date: 20-Jun-2024
  • (2024)SERENUS: Alleviating Low-Battery Anxiety Through Real-time, Accurate, and User-Friendly Energy Consumption Prediction of Mobile ApplicationsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676437(1-20)Online publication date: 13-Oct-2024
  • (2024)Cross-Core Data Sharing for Energy-Efficient GPUsACM Transactions on Architecture and Code Optimization10.1145/3653019Online publication date: 18-Mar-2024
  • (2024)Compiler-Managed Replication of CUDA Kernels for Reliable Execution of GPGPU ApplicationsJournal of Circuits, Systems and Computers10.1142/S021812662450254233:14Online publication date: 18-Apr-2024
  • (2024)TURBULENCE: Complexity-Effective Out-of-Order Execution on GPU With Distance-Based ISAIEEE Computer Architecture Letters10.1109/LCA.2023.328931723:2(175-178)Online publication date: Jul-2024
  • (2024)A Performance and Power Comparison of Contemporary GPGPU Architectures2024 3rd International Conference for Innovation in Technology (INOCON)10.1109/INOCON60754.2024.10512242(1-5)Online publication date: 1-Mar-2024
  • (2024)Pathfinding Future PIM Architectures by Demystifying a Commercial PIM Technology2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00029(263-279)Online publication date: 2-Mar-2024
  • (2024)PresCount: Effective Register Allocation for Bank Conflict ReductionProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444841(170-181)Online publication date: 2-Mar-2024
  • (2023)MOELA: A Multi-Objective Evolutionary/Learning Design Space Exploration Framework for 3D Heterogeneous Manycore Platforms2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137276(1-6)Online publication date: Apr-2023
  • (2023)TURBULENCE: Complexity-effective Out-of-order Execution on GPU with Distance-based ISA2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137216(1-2)Online publication date: Apr-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media