Mark web pages for use with vision-language models
-
Updated
Mar 8, 2026 - TypeScript
Mark web pages for use with vision-language models
Set-of-Mark detection pipeline for macOS — Apple Vision, YOLO11, and VLM on MLX. Transforms screenshots into numbered element maps and structured JSON manifests.
Temporal smoothing for UI element detection with OmniParser integration
Execution API that gives language models (Claude Code, Gemini…) the ability to see web interfaces — Playwright · Set-of-Mark · ReAct
Add a description, image, and links to the set-of-mark topic page so that developers can more easily learn about it.
To associate your repository with the set-of-mark topic, visit your repo's landing page and select "manage topics."