blackwell

Star

Here are 128 public repositories matching this topic...

vllm-project / vllm

Sponsor

Star

A high-throughput and memory-efficient inference and serving engine for LLMs

Updated Jun 19, 2026
Python

sgl-project / sglang

Star

SGLang is a high-performance serving framework for large language models and multimodal models.

reinforcement-learning cuda inference transformer moe attention llama glm minimax wan diffusion vlm blackwell llm qwen deepseek gpt-oss qwen-image

Updated Jun 19, 2026
Python

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

cuda pytorch moe blackwell llm-serving

Updated Jun 19, 2026
Python

lightseekorg / tokenspeed

Star

TokenSpeed is a speed-of-light LLM inference engine.

glm minimax vlm kimi blackwell llm qwen speed-of-light deepseek tokenspeed gpt-oss lightseek

Updated Jun 19, 2026
Python

openlake-project / openlake

Star

OpenLake is a high performance storage engine for efficient LLM inference and GPU Training

rust storage gpu high-performance throughput rdma gpt model-serving blackwell llm llm-training

Updated Jun 16, 2026
Rust

GradientHQ / parallax

Star

Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere

python distributed-systems chatbot pytorch transformer llama glm minimax kimi blackwell large-language-models llm llm-serving qwen deepseek oss-gpt decentralized-inference

Updated Jun 18, 2026
Python

NVIDIA / cudnn-frontend

Star

cuDNN Frontend is NVIDIA's modern, open-source entry point to the cuDNN library and a growing collection of high-performance open-source kernels.

Updated Jun 17, 2026
Python

patrick-toulme / pyptx

Star

A Python DSL to write Nvidia PTX for Hopper and Blackwell in JAX and PyTorch

pytorch nvidia hopper nvidia-gpu ptx jax blackwell

Updated May 8, 2026
Python

AEON-7 / Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash

Star

Fully uncensored, capability-enhanced abliteration of Qwen3.6-27B. NVFP4 + z-lab DFlash speculative decoding (n=12) on the unified ghcr.io/aeon-7/aeon-vllm-ultimate:latest container, tuned for long-context draft acceptance on DGX Spark. 6 HF variants (BF16/NVFP4/MTP/MTP-XS), docker-compose, and QuickStart.

quantization uncensored blackwell llm vllm qwen speculative-decoding abliteration qwen3 nvfp4 dgx-spark dflash

Updated Jun 18, 2026
Python

IST-DASLab / qutlass

Star

QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning

cuda blackwell quantization-aware-training post-training-quantization

Updated Nov 11, 2025
C++

eelbaz / dgx-spark-vllm-setup

Star

One-command vLLM installation for NVIDIA DGX Spark with Blackwell GB10 GPUs (sm_121 architecture)

machine-learning ai deep-learning gpu cuda pytorch nvidia arm64 blackwell llm vllm llm-inference gb10 dgx-spark

Updated Oct 28, 2025
Shell

0xSero / glm-5.2-sm120

Sponsor

Star

GLM-5.2-NVFP4-REAP-469B serving on SM120 (4× RTX PRO 6000 Blackwell) — one-command vLLM launch recipe, 250K context, DeepSeek Sparse Attention + MTP speculative decode

moe glm reap blackwell vllm llm-inference sm120 nvfp4 rtx-pro-6000

Updated Jun 19, 2026
Shell

dougeeai / llama-cpp-python-wheels

Star

Pre-built wheels for llama-cpp-python across platforms and CUDA versions

Updated Apr 18, 2026

5p00kyy / club-5060ti

Star

Practical local LLM recipes and benchmarks for RTX 5060 Ti setups

benchmark cuda nvidia blackwell llama-cpp vllm qwen ik-llama-cpp

Updated May 30, 2026
HTML

6Morpheus6 / deepspeed-windows-wheels

Star

Prebuilt DeepSpeed wheels for Windows with NVIDIA GPU support. Supports GTX 10 - RTX 50 series. Compiled with pytorch 2.7, 2.8 and cuda 12.8

windows blackwell deepspeed prebuilt-wheels

Updated Feb 13, 2026

AEON-7 / vllm-dflash

Star

DFlash vLLM for DGX Spark — Plug & Play Block-Diffusion Speculative Decoding

docker inference nvidia blackwell llm vllm qwen speculative-decoding block-diffusion nvfp4 dgx-spark dflash

Updated May 1, 2026
Python

AEON-7 / comfyui-aeon-spark

Star

Bleeding-edge ComfyUI for NVIDIA DGX Spark (GB10/Blackwell/sm_121a). CUDA 13 + SageAttention v3 (sm_121a) + NVFP4 + 14 custom-node packs + Flux 2 Dev / LTX 2.3 22B / ACE-Step v1.5 XL Turbo pre-bundled with abliterated text-encoder paths.

docker flux blackwell comfyui sageattention ltx-video ace-step nvfp4 dgx-spark sm-121a

Updated May 4, 2026
Shell

dconsorte / pytorch-tensorflow-gpu

Star

RTX 5090 & RTX 5060 Docker container with PyTorch + TensorFlow. First fully-tested Blackwell GPU support for ML/AI. CUDA 12.8, Python 3.11, Ubuntu 24.04. Works with RTX 50-series (5090/5080/5070/5060) and RTX 40-series.

docker machine-learning deep-learning tensorflow cuda pytorch gpu-computing blackwell rtx-5090 rtx-5060 blackwell-gpu nvidia-blackwell cuda-12-8 rtx-50-series rtx-5080

Updated Jul 8, 2025
Shell

hiroki-abe-58 / ComfyUI-Win-Blackwell

Star

python ai cuda blackwell aigc comfyui comfyui-workflow rtx5090

Updated Mar 2, 2026
PowerShell

Mekopa / whisperx-blackwell

Star

GPU-accelerated WhisperX on NVIDIA Blackwell (SM_121) - DGX Spark compatible

audio docker machine-learning deep-learning gpu cuda pytorch nvidia speech-recognition transcription asr speaker-diarization dgx blackwell pyannote whisperx dgx-spark sm-121

Updated Apr 23, 2026
Python

Improve this page

Add a description, image, and links to the blackwell topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the blackwell topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blackwell

Here are 128 public repositories matching this topic...

vllm-project / vllm

sgl-project / sglang

NVIDIA / TensorRT-LLM

lightseekorg / tokenspeed

openlake-project / openlake

GradientHQ / parallax

NVIDIA / cudnn-frontend

patrick-toulme / pyptx

AEON-7 / Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash

IST-DASLab / qutlass

eelbaz / dgx-spark-vllm-setup

0xSero / glm-5.2-sm120

dougeeai / llama-cpp-python-wheels

5p00kyy / club-5060ti

6Morpheus6 / deepspeed-windows-wheels

AEON-7 / vllm-dflash

AEON-7 / comfyui-aeon-spark

dconsorte / pytorch-tensorflow-gpu

hiroki-abe-58 / ComfyUI-Win-Blackwell

Mekopa / whisperx-blackwell

Improve this page

Add this topic to your repo