default search action
SC 2020: Virtual Event / Atlanta, Georgia, USA
- Christine Cuicchi, Irene Qualters, William T. Kramer:
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020, Virtual Event / Atlanta, Georgia, USA, November 9-19, 2020. IEEE/ACM 2020, ISBN 978-1-7281-9998-6
ACM Gordon Bell finalists
- Hisashi Yashiro, Koji Terasaki, Yuta Kawai, Shuhei Kudo, Takemasa Miyoshi, Toshiyuki Imamura, Kazuo Minami, Hikaru Inoue, Tatsuo Nishiki, Takayuki Saji, Masaki Satoh, Hirofumi Tomita:
A 1024-member ensemble data assimilation with 3.5-km mesh global weather simulations. 1 - Ruonan Wang, Rodrigo Tobar, Markus Dolensky, Tao An, Andreas Wicenec, Chen Wu, Fred Dulwich, Norbert Podhorszki, Valentine Anantharaj, Eric Suchyta, Bao-qiang Lao, Scott Klasky:
Processing full-scale square kilometre array data on the summit supercomputer. 2 - Chisachi Kato, Yoshinobu Yamade, Katsuhiro Nagano, Kiyoshi Kumahata, Kazuo Minami, Tatsuo Nishikawa:
Toward realization of numerical towing-tank tests by wall-resolved large eddy simulation based on 32 billion grid finite-element computation. 3 - Mauro Del Ben, Charlene Yang, Zhenglu Li, Felipe H. da Jornada, Steven G. Louie, Jack Deslippe:
Accelerating large-scale excited-state GW calculations on leadership HPC systems. 4 - Weile Jia, Han Wang, Mohan Chen, Denghui Lu, Lin Lin, Roberto Car, Weinan E, Linfeng Zhang:
Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning. 5 - Ramakrishnan Kannan, Piyush Sao, Hao Lu, Drahomira Herrmannova, Vijay Thakkar, Robert M. Patton, Richard W. Vuduc, Thomas E. Potok:
Scalable knowledge graph analytics at 136 petaflop/s. 6
Constraints & physics in machine learning
- Ankit Srivastava, Sriram P. Chockalingam, Srinivas Aluru:
A parallel framework for constraint-based bayesian network learning via markov blanket discovery. 7 - Romit Maulik, Romain Egele, Bethany Lusch, Prasanna Balaprakash:
Recurrent neural network architecture search for geophysical emulation. 8 - Chiyu Max Jiang, Soheil Esmaeilzadeh, Kamyar Azizzadenesheli, Karthik Kashinath, Mustafa Mustafa, Hamdi A. Tchelepi, Philip Marcus, Prabhat, Anima Anandkumar:
MeshfreeFlowNet: a physics-constrained deep continuous space-time super-resolution framework. 9
I/O
- Qiao Kang, Robert B. Ross, Robert Latham, Sunwoo Lee, Ankit Agrawal, Alok N. Choudhary, Wei-keng Liao:
Improving all-to-many personalized communication in two-phase I/O. 10 - Zhenbo Qiao, Qing Liu, Norbert Podhorszki, Scott Klasky, Jieyang Chen:
Taming I/O variation on QoS-less HPC storage: what can applications do? 11 - Jian Zhang, Tao Xie, Yuzhuo Jing, Yanjie Song, Guanzhou Hu, Si Chen, Shu Yin:
BORA: a bag optimizer for robotic analysis. 12
Quantum simulation
- Ang Li, Omer Subasi, Xiu Yang, Sriram Krishnamoorthy:
Density matrix quantum circuit simulation via the BSP machine on modern GPU clusters. 13 - Yuchen Pang, Tianyi Hao, Annika Dugad, Yiqing Zhou, Edgar Solomonik:
Efficient 2D tensor network simulation of quantum systems. 14 - Tirthak Patel, Devesh Tiwari:
Veritas: accurately estimating the correct output on noisy intermediate-scale quantum computers. 15
Sparsity in deep learning
- Cong Guo, Bo Yang Hsueh, Jingwen Leng, Yuxian Qiu, Yue Guan, Zehuan Wang, Xiaoying Jia, Xipeng Li, Minyi Guo, Yuhao Zhu:
Accelerating sparse DNN models without hardware-support via tile-wise sparsity. 16 - Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen:
Sparse GPU kernels for deep learning. 17 - Qingxiao Sun, Yi Liu, Ming Dun, Hailong Yang, Zhongzhi Luan, Lin Gan, Guangwen Yang, Depei Qian:
SpTFS: sparse tensor format selection for MTTKRP via deep learning. 18
Memory efficient deep learning
- Mohamed Wahib, Haoyu Zhang, Truong Thao Nguyen, Aleksandr Drozd, Jens Domke, Lingqi Zhang, Ryousei Takano, Satoshi Matsuoka:
Scaling distributed deep learning workloads beyond the memory capacity with KARMA. 19 - Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He:
ZeRO: memory optimizations toward training trillion parameter models. 20 - Minhui Xie, Kai Ren, Youyou Lu, Guangxu Yang, Qingxing Xu, Bihai Wu, Jiazhen Lin, Hongbo Ao, Wanhong Xu, Jiwu Shu:
Kraken: memory-efficient continual learning for large-scale real-time recommendations. 21
Molecular dynamics & material science
- Xiaohui Duan, Ping Gao, Meng Zhang, Tingjian Zhang, Hongsong Meng, Yuxuan Li, Bertil Schmidt, Haohuan Fu, Lin Gan, Wei Xue, Weiguo Liu, Guangwen Yang:
Cell-list based molecular dynamics on many-core processors: a case study on sunway TaihuLight supercomputer. 22 - Hansol Suh, Tobin Isaac:
Evaluation of a minimally synchronous algorithm for 2: 1 octree balance. 23 - Ryan Levy, Edgar Solomonik, Bryan K. Clark:
Distributed-memory DMRG via sparse and dense parallel tensor contractions. 24
Networks
- Min Yee Teh, Yu-Han Hung, George Michelogiannakis, Shijia Yan, Madeleine Glick, John Shalf, Keren Bergman:
TAGO: rethinking routing design in high performance reconfigurable networks. 25 - Gengchen Liu, Roberto Proietti, Marjan Fariborz, Pouya Fotouhi, Xian Xiao, S. J. Ben Yoo:
Architecture and performance studies of 3D-Hyper-FleX-LION for reconfigurable all-to-all HPC networks. 26 - Maciej Besta, Marcel Schneider, Marek Konieczny, Karolina Cynk, Erik Henriksson, Salvatore Di Girolamo, Ankit Singla, Torsten Hoefler:
FatPaths: routing in supercomputers and data centers when shortest paths fall short. 27
Tools
- Yuyang Jin, Haojie Wang, Teng Yu, Xiongchao Tang, Torsten Hoefler, Xu Liu, Jidong Zhai:
ScalAna: automating scaling loss detection with graph analysis. 28 - Xin You, Hailong Yang, Zhongzhi Luan, Depei Qian, Xu Liu:
ZeroSpy: exploring software inefficiency with redundant zeros. 29 - Qidong Zhao, Xu Liu, Milind Chabbi:
DrCCTProf: a fine-grained call path profiler for ARM-based clusters. 30
AI for IT
- Di Zhang, Dong Dai, Youbiao He, Forrest Sheng Bao, Bing Xie:
RLScheduler: an automated HPC batch job scheduler using reinforcement learning. 31 - Quan Chen, Shuai Xue, Shang Zhao, Shanpei Chen, Yihao Wu, Yu Xu, Zhuo Song, Tao Ma, Yong Yang, Minyi Guo:
Alita: comprehensive performance isolation through bias resource management for public clouds. 32 - Mihailo Isakov, Eliakin Del Rosario, Sandeep Madireddy, Prasanna Balaprakash, Philip H. Carns, Robert B. Ross, Michel A. Kinsy:
HPC I/O throughput bottleneck analysis with explainable local models. 33
Communication & networks
- S. Mahdieh Ghazimirsaeed, Qinghua Zhou, Amit Ruhela, Mohammadreza Bayatpour:
A hierarchical and load-aware design for large message neighborhood collectives. 34 - Daniele De Sensi, Salvatore Di Girolamo, Kim H. McMahon, Duncan Roweth, Torsten Hoefler:
An in-depth analysis of the slingshot interconnect. 35 - Kaiming Ouyang, Min Si, Atsushi Hori, Zizhong Chen, Pavan Balaji:
CAB-MPI: exploring interprocess work-stealing towards balanced MPI communication. 36
GPU-accelerated applications
- Mert Hidayetoglu, Tekin Bicer, Simon Garcia De Gonzalo, Bin Ren, Vincent De Andrade, Doga Gürsoy, Raj Kettimuthu, Ian T. Foster, Wen-mei W. Hwu:
Petascale XCT: 3D image reconstruction with hierarchical communications on multi-GPU nodes. 37 - Malte Brunn, Naveen Himthani, George Biros, Miriam Mehl, Andreas Mang:
Multi-node multi-GPU diffeomorphic image registration for large-scale imaging problems. 38 - Sneha D. Goenka, Yatish Turakhia, Benedict Paten, Mark Horowitz:
SegAlign: a scalable GPU-based whole genome aligner. 39
System software at scale
- Edgar A. León, Trent D'Hooge, Nathan Hanford, Ian Karlin, Ramesh Pankajakshan, Jim Foraker, Chris Chambreau, Matthew L. Leininger:
TOSS-2020: a commodity software stack for HPC. 40 - George Ostrouchov, Don Maxwell, Rizwan A. Ashraf, Christian Engelmann, Mallikarjun Shankar, James H. Rogers:
GPU lifetimes on titan supercomputer: survival analysis and reliability. 41 - Gabor Torok, Mark R. Day, Rebecca Hartman-Baker, Cory Snavely:
Iris: allocation banking and identity and access management for the exascale era. 42
Distributed deep learning
- Dhiraj D. Kalamkar, Evangelos Georganas, Sudarshan Srinivasan, Jianping Chen, Mikhail Shiryaev, Alexander Heinecke:
Optimizing deep learning recommender systems training on CPU cluster architectures. 43 - Indu Thangakrishnan, Derya Cavdar, Can Karakus, Piyush Ghai, Yauheni Selivonchyk, Cory Pruce:
Herring: rethinking the parameter server at scale for the cloud. 44 - Arpan Jain, Ammar Ahmad Awan, Asmaa M. Aljuhani, Jahanzeb Maqbool Hashmi, Quentin G. Anthony, Hari Subramoni, Dhabaleswar K. Panda, Raghu Machiraju, Anil Parwani:
GEMS: GPU-enabled memory-aware model-parallelism system for distributed DNN training. 45
Exascale and beyond
- Tirthak Patel, Abhay Potharaju, Baolin Li, Rohan Basu Roy, Devesh Tiwari:
Experimental evaluation of NISQ quantum computers: error measurement, characterization, and implications. 46 - Mitsuhisa Sato, Yutaka Ishikawa, Hirofumi Tomita, Yuetsu Kodama, Tetsuya Odajima, Miwako Tsuji, Hisashi Yashiro, Masaki Aoki, Naoyuki Shida, Ikuo Miyoshi, Kouichi Hirai, Atsushi Furuya, Akira Asato, Kuniki Morita, Toshiyuki Shimizu:
Co-design for A64FX manycore processor and "Fugaku". 47 - Kevin T. Pedretti, Andrew J. Younge, Simon D. Hammond, James H. Laros III, Matthew L. Curry, Michael J. Aguilar, Robert J. Hoekstra, Ron Brightwell:
Chronicles of astra: challenges and lessons from the first petascale arm supercomputer. 48
Floating-point accuracy
- Hui Guo, Ignacio Laguna, Cindy Rubio-González:
pLiner: isolating lines of floating-point code for compiler-induced variability. 49 - Hugo Brunie, Costin Iancu, Khaled Z. Ibrahim, Philip Brisk, Brandon Cook:
Tuning floating-point precision using dynamic program information and temporal locality. 50 - Arnab Das, Ian Briggs, Ganesh Gopalakrishnan, Sriram Krishnamoorthy, Pavel Panchekha:
Scalable yet rigorous floating-point error analysis. 51
Programmable parallelism
- Tianxi Li, Dipti Shankar, Shashank Gugnani, Xiaoyi Lu:
RDMP-KV: designing remote direct memory persistence based key-value stores with PMEM. 52 - Souradip Ghosh, Michael Cuevas, Simone Campanoni, Peter A. Dinda:
Compiler-based timing for extremely fine-grain preemptive parallelism. 53 - Bradley Swain, Yanze Li, Peiming Liu, Ignacio Laguna, Giorgis Georgakoudis, Jeff Huang:
OMPRacer: a scalable and precise static race detector for OpenMP programs. 54
GPU algorithms and optimizations
- Marco Minutoli, Prathyush Sambaturu, Mahantesh Halappanavar, Antonino Tumeo, Ananth Kalyanaraman, Anil Vullikanti:
Preempt: scalable epidemic interventions using submodular optimization on multi-GPU systems. 55 - Santosh Pandey, Lingda Li, Adolfy Hoisie, Xiaoye S. Li, Hang Liu:
C-SAW: a framework for graph sampling and random walk on GPUs. 56 - Chih-Hao Fang, Sudhir B. Kylasa, Fred Roosta, Michael W. Mahoney, Ananth Grama:
Newton-ADMM: a distributed GPU-accelerated optimizer for multiclass classification problems. 57
Porting applications to specific hardware
- Kamil Rocki, Dirk Van Essendelft, Ilya Sharapov, Robert Schreiber, Michael Morrison, Vladimir Kibardin, Andrey Portnoy, Jean-Francois Dietiker, Madhava Syamlal, Michael James:
Fast stencil-code computation on a wafer-scale processor. 58 - Tiziano De Matteis, Johannes de Fine Licht, Torsten Hoefler:
fBLAS: streaming linear algebra on FPGA. 59 - Nariaki Tateiwa, Yuji Shinano, Satoshi Nakamura, Akihiro Yoshida, Shizuo Kaji, Masaya Yasuda, Katsuki Fujisawa:
Massive parallelization for finding shortest lattice vectors based on ubiquity generator framework. 60
Simulation, modeling and benchmarks
- Isaac Boixaderas, Darko Zivanovic, Sergi Moré, Javier Bartolome, David Vicente, Marc Casas, Paul M. Carpenter, Petar Radojkovic, Eduard Ayguadé:
Cost-aware prediction of uncorrected DRAM errors in the field. 61 - Elliott Slaughter, Wei Wu, Yuankun Fu, Legend Brandenburg, Nicolai Garcia, Wilhem Kautz, Emily Marx, Kaleb S. Morris, Qinglei Cao, George Bosilca, Seema Mirchandaney, Wonchan Lee, Sean Treichler, Patrick S. McCormick, Alex Aiken:
Task bench: a parameterized benchmark for evaluating parallel runtime performance. 62 - Wenqian Dong, Zhen Xie, Gokcen Kestor, Dong Li:
Smart-PGSim: using neural network to accelerate AC-OPF power grid simulation. 63
Storage resiliency
- Amirhessam Yazdi, Xing Lin, Lei Yang, Feng Yan:
SEFEE: lightweight storage error forecasting in large-scale enterprise storage systems. 64 - Saurabh Jha, Shengkun Cui, Subho S. Banerjee, Tianyin Xu, Jeremy Enos, Mike Showerman, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer:
Live forensics for HPC systems: a case study on distributed storage systems. 65 - Haiyang Shi, Xiaoyi Lu:
INEC: fast and coherent in-network erasure coding. 66
Containers and serverless computing
- Pradeep Ambati, Noman Bashir, David E. Irwin, Prashant J. Shenoy:
Waiting game: optimally provisioning fixed resources for cloud-enabled schedulers. 67 - Luping Wang, Qizhen Weng, Wei Wang, Chen Chen, Bo Li:
Metis: learning to schedule long-running applications in shared container clusters at scale. 68 - Ahsan Ali, Riccardo Pinciroli, Feng Yan, Evgenia Smirni:
Batch: machine learning inference serving on serverless platforms with adaptive batching. 69
Graph neural networks
- Alok Tripathy, Katherine A. Yelick, Aydin Buluç:
Reducing communication in graph neural network training. 70 - Yuwei Hu, Zihao Ye, Minjie Wang, Jiali Yu, Da Zheng, Mu Li, Zheng Zhang, Zhiru Zhang, Yida Wang:
FeatGraph: a flexible and efficient backend for graph neural network systems. 71 - Guyue Huang, Guohao Dai, Yu Wang, Huazhong Yang:
GE-SpMM: general-purpose sparse matrix-matrix multiplication on GPUs for graph neural networks. 72
Linear algebra and its application
- Ammar Hakim, James Juno:
Alias-free, matrix-free, and quadrature-free discontinuous galerkin algorithms for (plasma) kinetic equations. 73 - Srinivas Eswar, Koby Hayashi, Grey Ballard, Ramakrishnan Kannan, Richard W. Vuduc, Haesun Park:
Distributed-memory parallel symmetric nonnegative matrix factorization. 74 - Oguz Selvitopi, Saliya Ekanayake, Giulia Guidi, Georgios A. Pavlopoulos, Ariful Azad, Aydin Buluç:
Distributed many-to-many protein sequence alignment using sparse matrices. 75
Resilience and power management
- Luc Jaulmes, Miquel Moretó, Mateo Valero, Mattan Erez, Marc Casas:
Runtime-guided ECC protection using online estimation of memory vulnerability. 76 - Twinkle Jain, Gene Cooperman:
CRAC: checkpoint-restart architecture for CUDA with streams and UVM. 77 - Xiaofeng Hou, Chao Li, Jiacheng Liu, Lu Zhang, Yang Hu, Minyi Guo:
ANT-man: towards agile power management in the microservice era. 78
Computational chemistry
- Jinsung Kim, Ajay Panyala, Bo Peng, Karol Kowalski, P. Sadayappan, Sriram Krishnamoorthy:
Scalable heterogeneous execution of a coupled-cluster model with perturbative triples. 79 - Michael Lass, Robert Schade, Thomas D. Kühne, Christian Plessl:
A submatrix-based method for approximate matrix function evaluation in the quantum chemistry code CP2K. 80 - Giuseppe M. J. Barca, David L. Poole, Jorge L. Galvez Vallejo, Melisa Alkan, Colleen Bertoni, Alistair P. Rendell, Mark S. Gordon:
Scaling the hartree-fock matrix build on summit. 81
Data analysis and reduction
- Haoyuan Xing, Gagan Agrawal, Rajiv Ramnath:
MoHA: a composable system for efficient in-situ analytics on heterogeneous HPC systems. 82 - Pascal Grosset, Christopher M. Biwer, Jesus Pulido, Arvind T. Mohan, Ayan Biswas, John Patchett, Terece L. Turton, David H. Rogers, Daniel Livescu, James P. Ahrens:
Foresight: analysis that matters for data reduction. 83 - Tirthak Patel, Zhengchun Liu, Raj Kettimuthu, Paul Rich, William E. Allcock, Devesh Tiwari:
Job characteristics on large-scale systems: long-term analysis, quantification, and implications. 84
Kernel optimizations
- Hengjie Wang, Aparna Chandramowlishwaran:
Pencil: a pipelined algorithm for distributed stencils. 85 - Serif Yesil, Azin Heidarshenas, Adam Morrison, Josep Torrellas:
Speeding up SpMV for power-law graph analytics by enhancing locality & vectorization. 86 - Süreyya Emre Kurt, Aravind Sukumaran-Rajam, Fabrice Rastello, P. Sadayappan:
Efficient tiled sparse matrix multiplication through matrix signatures. 87
GPU-based tools and modeling
- Abdul Rehman Anwer, Guanpeng Li, Karthik Pattabiraman, Michael B. Sullivan, Timothy Tsai, Siva Kumar Sastry Hari:
GPU-trident: efficient modeling of error propagation in GPU programs. 88 - Keren Zhou, Yueming Hao, John M. Mellor-Crummey, Xiaozhu Meng, Xu Liu:
GVProf: a value profiler for GPU-based clusters. 89 - Shaoqi Wang, Oscar J. Gonzalez, Xiaobo Zhou, Thomas Williams, Brian D. Friedman, Martin Havemann, Thomas Y. C. Woo:
An efficient and non-intrusive GPU scheduling framework for deep learning training systems. 90
Multiphysics applications
- Max P. Katz, Ann S. Almgren, Maria Barrios Sazo, Kiran Eiden, Kevin Gott, Alice Harpole, Jean M. Sexton, Donald E. Willcox, Weiqun Zhang, Michael Zingale:
Preparing nuclear astrophysics for exascale. 91 - Luca Bertagna, Oksana Guba, Mark A. Taylor, James G. Foucar, Jeff Larkin, Andrew M. Bradley, Sivasankaran Rajamanickam, Andrew G. Salinger:
A performance-portable nonhydrostatic atmospheric dycore for the energy exascale earth system model running at cloud-resolving resolutions. 92 - Yasuhiro Idomura, Takuya Ina, Yussuf Ali, Toshiyuki Imamura:
Acceleration of fusion plasma turbulence simulations using the mixed-precision communication-avoiding krylov method. 93
Quantized and factorized deep learning
- J. Gregory Pauloski, Zhao Zhang, Lei Huang, Weijia Xu, Ian T. Foster:
Convolutional neural network training with distributed K-FAC. 94 - Yongkweon Jeon, Baeseong Park, Se Jung Kwon, Byeongwook Kim, Jeongin Yun, Dongsoo Lee:
BiQGEMM: matrix multiplication with lookup table for binary-coding-based quantized DNNs. 95 - Hsiang-Tsung Kung, Bradley McDanel, Sai Qian Zhang:
Term quantization: furthering quantization at run time. 96
Compiler optimizations
- Troels Henriksen, Sune Hellfritzsch, Ponnuswamy Sadayappan, Cosmin E. Oancea:
Compiling generalized histograms for GPU. 97 - Jacob Lambert, Seyong Lee, Jeffrey S. Vetter, Allen D. Malony:
CCAMP: an integrated translation and optimization framework for OpenACC and OpenMP. 98
Graph algorithms
- Maciej Besta, Armon Carigiet, Kacper Janda, Zur Vonarburg-Shmaria, Lukas Gianinazzi, Torsten Hoefler:
High-performance parallel graph coloring with strong guarantees on work, depth, and quality. 99 - Tianhui Shi, Mingshu Zhai, Yi Xu, Jidong Zhai:
GraphPi: high performance graph pattern matching through effective redundancy elimination. 100 - Stijn Heldens, Pieter Hijma, Ben van Werkhoven, Jason Maassen, Henri E. Bal, Rob van Nieuwpoort:
Rocket: efficient and scalable all-pairs computations on heterogeneous platforms. 101
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.