Generate and edit images in Claude with OpenAI's gpt-image-2 — text rendering that finally works.
中文版說明請見 README.zh.md。
A Model Context Protocol (MCP) server that exposes OpenAI's gpt-image-2 (released 2026-04-21) and gpt-image-1 to Claude as two tools: text-to-image and multi-reference edit. Built for the IG / poster / product-shot workflow — where dense in-image text rendering and reference fusion actually matter.
Inputs over 4 MB or 1024 px are auto-resized before upload, so you don't have to babysit aspect ratios or compress reference photos by hand.
- Two tools, no leaks:
gpt_image_generate(text → image) andgpt_image_edit(text + 1–5 references + optional mask → image) - Auto-resize on input: oversize images get a temp 1024 px PNG cache so the API call doesn't time out or 413
- gpt-image-2 by default, easy fallback to
gpt-image-1for accounts without org verification - Slash skills
/gpt-imageand/gpt-image-editwith a curated parameter cheatsheet (size, quality, model fallback, cost-aware defaults) - Saves to disk and returns a path — survives the 60 s MCP timeout (image still lands on disk even if MCP layer disconnects)
.env-based key handling so macOS GUI Claude can find yourOPENAI_API_KEYwithout launchd hacks
- Download
claude-gpt-image2-mcp-v0.1.0.pluginfrom Releases. - In Cowork: Plugins → Install → Select file.
- Set up your API key (paste into the plugin folder's
.env):cd <plugin-dir> cp .env.example .env # Edit .env: OPENAI_API_KEY=sk-proj-YOUR_KEY chmod 600 .env
- Restart Cowork (Cmd+Q, reopen).
- Try it:
/gpt-image 一張極簡海報,黑底白字,文字內容:「保持好奇」(Stay Curious)
For users not on Cowork:
- Clone:
git clone https://github.com/Impossible-Studio-TW/claude-gpt-image2-mcp.git cd claude-gpt-image2-mcp cp .env.example .env # then edit .env with your OpenAI key chmod 600 .env
- Register in
~/Library/Application Support/Claude/claude_desktop_config.json:{ "mcpServers": { "gpt-image2": { "command": "bash", "args": ["/absolute/path/to/claude-gpt-image2-mcp/run.sh"] } } } - Restart Claude. First call auto-creates a venv (~30 s); subsequent calls are instant.
OPENAI_API_KEY is read from a .env file inside the plugin directory — more reliable than shell env on macOS, since GUI apps don't inherit shell config. The .env file is in .gitignore; never commit it.
Default save folder is the user's ~/Pictures/gpt-image2/. Override per call with save_to=<absolute path>, or edit DEFAULT_SAVE_DIR in server.py.
Prerequisites:
- Python 3.10+ (
brew install python@3.12on macOS) - An OpenAI API key with billing enabled
- For
gpt-image-2: organization verification at https://platform.openai.com/settings/organization/general (Persona ID + selfie, ~5 min to submit, up to 1–2 days to propagate). Without verification you can still usegpt-image-1by passingmodel="gpt-image-1".
gpt_image_generate(
prompt: str,
size: str = "1024x1024",
quality: str = "auto",
save_to: str | None = None,
model: str = "gpt-image-2",
) -> str # absolute path to saved PNG, or "[Error: ...]"
gpt_image_edit(
prompt: str,
image_paths: list[str], # 1–5 references; multi-image = char-lock attempt
mask_path: str | None = None, # transparent areas get edited
size: str = "1024x1024",
quality: str = "auto",
save_to: str | None = None,
model: str = "gpt-image-2",
) -> str
| Param | Common values | Notes |
|---|---|---|
size |
1024x1024 square / 1024x1536 portrait / 1536x1024 landscape |
gpt-image-2 supports custom multiples of 16 up to 3840 px, max aspect 3:1 |
quality |
low (medium (high (~$0.165) / auto |
auto lets the model pick from the prompt complexity |
model |
gpt-image-2 / gpt-image-1 |
Pass gpt-image-1 if your org isn't verified yet |
Slash skills /gpt-image and /gpt-image-edit ship with a Chinese / English keyword router so phrases like "高品質" or "/gpt-image-edit" map cleanly to the right tool and parameters.
Demo coming soon — see RECORD_DEMO.md for the recording script.
I'm a non-developer building IG content and brand decks daily. Every image-generation tool I tried was great until it had to render Chinese text on a poster — at which point the output became unusable. gpt-image-2 is the first model I've used that gets text right on the first try, and I wanted it inside Claude with the rest of my creative ops, not in another tab. The .env-loading and auto-resize bits are the kind of thing you only notice after a week of failed launches; this plugin folds that pain in once.
| Quality | 1024×1024 | Notes |
|---|---|---|
| low | ~$0.006 | Drafts, fast iteration |
| medium | ~$0.04 | Default-quality production |
| high | ~$0.165 | Best detail / text fidelity |
| auto | varies | Model picks based on prompt complexity |
Edit operations add input image tokens (~$0.02–0.05 extra at low quality).
- First-call delay: ~30 s for venv bootstrap; instant afterwards.
- MCP 60 s timeout: high-quality generations can exceed 60 s. The image still saves to disk — check the default folder or your
save_topath. - Multi-image character lock: gpt-image-1 multi-image is weak fusion only (good for style cues, not strict identity preservation). gpt-image-2 (post-verification) is meaningfully better. For true character lock, dedicated tools (Dreamina, Midjourney character ref, Flux PuLID, custom LoRAs) still outperform.
- Content policy: OpenAI rejects prompts featuring real celebrities, copyrighted characters, explicit content, or violence.
- Org verification gate:
gpt-image-2requires verification; until then, fallback togpt-image-1viamodel="gpt-image-1".
See CONTRIBUTING.md. Issues, PRs, and feature suggestions are welcome — this is a personal project shared as-is, maintenance is best-effort.
MIT — see LICENSE.
- Built by Icy — Impossible Studio TW / Taipei, Taiwan
- GitHub: @Impossible-Studio-TW
- X / Threads / IG: TBD
PRs welcome — especially for additional model fallbacks, new quality heuristics, or better keyword routing in the slash skills.