Code for GPU-accelerating arbitrary-sized matrix-matrix multiplication in Python by exposing C++ and CUDA code to Python using Pybind11.
- Cuda installed in /usr/local/cuda
- CMake 3.3 or later
- Python 3.8.10 or later
- PythonInterp 3.6 or later
- PythonLibs 3.6 or later
Should compile out of the box by doing the following:
sudo chmod +x bind_code.sh
./bind_code.sh
python3 matmul.py