iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://unpaywall.org/10.1109/IPDPS.2018.00037
Highly Efficient Compensation-Based Parallelism for Wavefront Loops on GPUs (Conference) | OSTI.GOV
skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Highly Efficient Compensation-Based Parallelism for Wavefront Loops on GPUs

Conference ·

Wavefront loops are widely used in many scientific applications, e.g., partial differential equation (PDE) solvers and sequence alignment tools. However, due to the data dependencies in wavefront loops, it is challenging to fully utilize the abundant compute units of GPUs and to reuse data through their memory hierarchy. Existing solutions can only optimize for these factors to a limited extent. For example, tiling-based methods optimize memory access but may result in load imbalance; while compensation-based methods, which change the original order of computation to expose more parallelism and then compensate for it, suffer from both global synchronization overhead and limited generality. In this paper, we first prove under which circumstances that breaking data dependencies and properly changing the sequence of computation operators in our compensation-based method does not affect the correctness of results. Based on this analysis, we design a highly efficient compensation-based parallelism on GPUs. Our method provides weighted scan-based GPU kernels to optimize the computation and combines with the tiling method to optimize memory access and synchronization. The performance results on the NVIDIA K80 and P100 GPU platforms demonstrate that our method can achieve significant improvements for four types of real-world application kernels over the state-of-the-art research.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1474547
Resource Relation:
Conference: IEEE International Parallel and Distributed Processing Symposium (IPDPS 2018) - Vancouver, , Canada - 5/21/2018 8:00:00 AM-5/25/2018 8:00:00 AM
Country of Publication:
United States
Language:
English

References (22)

Communication Optimization on GPU: A Case Study of Sequence Alignment Algorithms conference May 2017
AAlign: A SIMD Framework for Pairwise Sequence Alignment on x86-Based Multi-and Many-Core Processors conference May 2016
Dymaxion: optimizing memory access patterns for heterogeneous systems
  • Che, Shuai; Sheaffer, Jeremy W.; Skadron, Kevin
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063401
conference January 2011
Acceleration of the Smith–Waterman algorithm using single and multiple graphics processors journal June 2010
Fast segmented sort on GPUs conference January 2017
ASPaS conference June 2015
Exploiting wavefront parallelism on large-scale shared-memory multiprocessors journal March 2001
A Framework for the Automatic Vectorization of Parallel Sort on x86-Based Processors journal May 2018
Identification of common molecular subsequences journal March 1981
Combining SIMD and Many/Multi-core Parallelism for Finite State Machines with Enumerative Speculation conference January 2017
Tiling and optimizing time-iterated computations on periodic domains conference August 2014
Automatic Parallelization of Tiled Loop Nests with Enhanced Fine-Grained Parallelism on GPUs conference September 2012
PeerWave: Exploiting Wavefront Parallelism on GPUs with Peer-SM Synchronization conference January 2015
An Evaluation of Vectorizing Compilers conference October 2011
Model-Driven Tile Size Selection for DOACROSS Loops on GPUs book January 2011
Striped Smith-Waterman speeds database searches six times over other SIMD implementations journal November 2006
StreamScan: fast scan algorithms for GPUs without global barrier synchronization conference January 2013
On the Robust Mapping of Dynamic Programming onto a Graphics Processing Unit conference January 2009
GPU-UniCache conference May 2017
Integral histogram: a fast way to extract histograms in Cartesian spaces conference January 2005
swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures
  • Wang, Xinliang; Liu, Weifeng; Xue, Wei
  • PPoPP '18: 23nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming https://doi.org/10.1145/3178487.3178513
conference February 2018
Loops skewing: The wavefront method revisited journal August 1986

Related Subjects