qwen-vl

Here are 49 public repositories matching this topic...

gokayfem / awesome-vlm-architectures

Famous Vision Language Models and Their Architectures

awesome awesome-list kosmos clip image-encoder vlm blip multimodal text-encoder vision-language-model llava internlm cogvlm qwen-vl

Updated Jan 11, 2026
Markdown

1038lab / ComfyUI-QwenVL

Sponsor

Star

ComfyUI-QwenVL custom node: Integrates the Qwen-VL series, including Qwen2.5-VL and the latest Qwen3-VL, with GGUF support for advanced multimodal AI in text generation, image understanding, and video analysis.

comfyui customnodes qwen-vl qwen3-vl

Updated Feb 10, 2026
Python

zli12321 / Vision-Language-Models-Overview

Star

A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.

reinforcement-learning clip claude world-models multimodal-models sota-model llava blip2 gpt-4v gemini-pro deepseek vision-language-models qwen-vl llama-vision-model multimodal-benchmarks vision-language-model-applications finevision-pretrain-dataset

Updated Jun 3, 2026
HTML

zjysteven / lmms-finetune

Star

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

finetuning multimodal vision-language foundation-models instruction-tuning large-language-model llava visual-instruction-tuning multimodal-large-language-models large-multimodal-models qwen-vl llava-next

Updated Feb 28, 2026
Python

aiptimizer / TurboOCR

Star

Fast GPU OCR server. 270 img/s on FUNSD. TensorRT FP16, PP-OCRv5, HTTP + gRPC.

ocr grpc nvidia text-recognition text-detection inference-server fp16 tensorrt rag fastapi pdf-extraction paddleocr easyocr document-ai document-parsing qwen-vl gpu-ocr

Updated Jun 11, 2026
C++

zli12321 / Vision-SR1

Star

Reinforcement Learning of Vision Language Models with Self Visual Perception Reward

reinforcement-learning self-improvement self-rewarding vision-language-models qwen-vl grpo self-evolving-ai visual-perception-reward

Updated Mar 14, 2026
Python

reidbarber / webmarker

Star

Mark web pages for use with vision-language models

som prompt gemini operator cua claude playwright prompt-engineering llms vision-language-model gpt4v qwen-vl gpt4o set-of-mark computer-use computer-using-agent

Updated Mar 8, 2026
TypeScript

dolphin-creator / VideoContext-Engine

Star

Local Video RAG Engine. A FastAPI microservice for video understanding: Scene Detection + Whisper ASR + Qwen3-VL. Optimized for Apple Silicon (MLX) & Windows/Linux (Llama.cpp).

python microservice whisper mlx video-analysis rag fastapi apple-silicon llama-cpp local-ai qwen-vl local-ai-agents

Updated Dec 4, 2025
Python

Codeeaner / Computer-Use-Agent

Star

An AI Agent that is able to control your screen to complste any task

agent ai desktop agents cua ai-agents autogen ai-tools llm qwen-vl computer-use browser-use computer-use-agent qwen3 browser-use-agent desktop-au visual-language-mo computer-aut agent-com

Updated Oct 23, 2025
Jupyter Notebook

TIGER-AI-Lab / RewardHarness

Star

Self-evolving agentic reward framework for image-editing evaluation — 47.4% on EditReward-Bench from only 100 preference demos, no reward-model training. arXiv 2605.08703.

image-editing gemini vlm preference-learning rlhf reward-model agentic qwen-vl self-evolving

Updated May 18, 2026
Python

janelu9 / EasyLLM

Star

Running Large Language Model easily.

llama fine-tuning megatron npu pretrain deepspeed rlhf vllm qwen deepseek qwen-vl

Updated Jun 18, 2026
Python

290298661-pixel / deepseek-eyes

Star

给 DeepSeek 装上眼睛 — MCP Server + 通义千问VL, 剪贴板图片→视觉模型→文字描述 / Give DeepSeek the ability to see images via clipboard + Qwen-VL

vision developer-tools chinese modelscope deepseek qwen-vl mcp-server claude-code

Updated May 31, 2026
Python

nanofatdog / video-to-prompt

Star

🎬 Extract AI prompts from video using Vision LLM (llama.cpp API) — Gradio WebUI + CLI

video-processing gradio prompt-engineering ai-prompts llamacpp comfyui qwen-vl vision-llm

Updated May 26, 2026
Python

zhangguanghao523 / DynFrame

Star

DynFrame: Adaptive Reasoning-Driven Multimodal Framework with Dynamic Frame Augmentation for Complex Video Understanding

reinforcement-learning video-understanding video-qa multimodal temporal-grounding chain-of-thought qwen-vl grpo dynamic-frame-retrieval

Updated May 29, 2026
Python

100noob / Qwen-Grasper

Star

A robotic sequential grasping system integrating YOLO detection and Qwen-VLM fine-tuning, enabling a full loop from manual teaching to LLM-based logical manipulation.

numpy yolo object-detection opencv-python vlm opencv2 robotic-arm ncnn-model ultralytics raspberry-pi-5 qwen-vl unsloth yolo11

Updated Jun 1, 2026
Python

autodistill / autodistill-qwen-vl

Star

Qwen-VL base model for use with Autodistill.

zero-shot-object-detection autodistill qwen-vl

Updated Feb 8, 2024
Python

cartesiosson / openvino-agent-stack

Star

Run Qwen3-8B at ~18 tok/s on a Core Ultra laptop iGPU. Local-only LLM + VLM + ReAct agent stack with $0 token cost. Drop-in Claude Code backend.

docker-compose igpu openvino wsl2 llm-agent local-llm react-agent qwen-vl open-webui claude-code qwen3 lunar-lake intel-core-ultra

Updated Jun 8, 2026
Python

gokul6350 / GNX-CLI

Star

🤖 The Next-Gen AI Agent. Unlike normal agents, it goes beyond text and can control your Desktop & Android.

android cli machine-learning automation ai computer-vision adb desktop-automation pyautogui ai-agent llm vision-language-model qwen-vl computer-use

Updated Feb 15, 2026
Python

anubisshah / sgun-qwen3.5-comfyui

Star

Enable local integration of Qwen3.5 models with ComfyUI for text generation and multimodal visual tasks, featuring automatic model management and precision control.

Updated Jun 16, 2026
Python

mangobanaani / movie2story

Star

creates text from video and audio using Qwen-VL and Whisper

python machine-learning qwen-vl

Updated Jan 24, 2026
Jupyter Notebook

Improve this page

Add a description, image, and links to the qwen-vl topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the qwen-vl topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qwen-vl

Here are 49 public repositories matching this topic...

gokayfem / awesome-vlm-architectures

1038lab / ComfyUI-QwenVL

zli12321 / Vision-Language-Models-Overview

zjysteven / lmms-finetune

aiptimizer / TurboOCR

zli12321 / Vision-SR1

reidbarber / webmarker

dolphin-creator / VideoContext-Engine

Codeeaner / Computer-Use-Agent

TIGER-AI-Lab / RewardHarness

janelu9 / EasyLLM

290298661-pixel / deepseek-eyes

nanofatdog / video-to-prompt

zhangguanghao523 / DynFrame

100noob / Qwen-Grasper

autodistill / autodistill-qwen-vl

cartesiosson / openvino-agent-stack

gokul6350 / GNX-CLI

anubisshah / sgun-qwen3.5-comfyui

mangobanaani / movie2story

Improve this page

Add this topic to your repo