A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Dec 2, 2024 - Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs
Purplecoin/XPU Core integration/staging tree
Finetune an LLM to generate SQL from text on Intel GPUs (XPUs) using QLoRA
A Cloud-native Bare-metal Networking Orchestration
Add a description, image, and links to the xpu topic page so that developers can more easily learn about it.
To associate your repository with the xpu topic, visit your repo's landing page and select "manage topics."