Skip to content

Impossible-Studio-TW/claude-gpt-image2-mcp

Repository files navigation

GPT Image 2 MCP

Generate and edit images in Claude with OpenAI's gpt-image-2 — text rendering that finally works.

License: MIT Cowork Plugin Status: Stable Default model: gpt-image-2

中文版說明請見 README.zh.md

What is this?

A Model Context Protocol (MCP) server that exposes OpenAI's gpt-image-2 (released 2026-04-21) and gpt-image-1 to Claude as two tools: text-to-image and multi-reference edit. Built for the IG / poster / product-shot workflow — where dense in-image text rendering and reference fusion actually matter.

Inputs over 4 MB or 1024 px are auto-resized before upload, so you don't have to babysit aspect ratios or compress reference photos by hand.

Features

  • Two tools, no leaks: gpt_image_generate (text → image) and gpt_image_edit (text + 1–5 references + optional mask → image)
  • Auto-resize on input: oversize images get a temp 1024 px PNG cache so the API call doesn't time out or 413
  • gpt-image-2 by default, easy fallback to gpt-image-1 for accounts without org verification
  • Slash skills /gpt-image and /gpt-image-edit with a curated parameter cheatsheet (size, quality, model fallback, cost-aware defaults)
  • Saves to disk and returns a path — survives the 60 s MCP timeout (image still lands on disk even if MCP layer disconnects)
  • .env-based key handling so macOS GUI Claude can find your OPENAI_API_KEY without launchd hacks

Quick Start (Cowork users)

  1. Download claude-gpt-image2-mcp-v0.1.0.plugin from Releases.
  2. In Cowork: Plugins → Install → Select file.
  3. Set up your API key (paste into the plugin folder's .env):
    cd <plugin-dir>
    cp .env.example .env
    # Edit .env: OPENAI_API_KEY=sk-proj-YOUR_KEY
    chmod 600 .env
  4. Restart Cowork (Cmd+Q, reopen).
  5. Try it: /gpt-image 一張極簡海報,黑底白字,文字內容:「保持好奇」(Stay Curious)

Quick Start (Claude Desktop / Claude Code users)

For users not on Cowork:

  1. Clone:
    git clone https://github.com/Impossible-Studio-TW/claude-gpt-image2-mcp.git
    cd claude-gpt-image2-mcp
    cp .env.example .env   # then edit .env with your OpenAI key
    chmod 600 .env
  2. Register in ~/Library/Application Support/Claude/claude_desktop_config.json:
    {
      "mcpServers": {
        "gpt-image2": {
          "command": "bash",
          "args": ["/absolute/path/to/claude-gpt-image2-mcp/run.sh"]
        }
      }
    }
  3. Restart Claude. First call auto-creates a venv (~30 s); subsequent calls are instant.

Configuration

OPENAI_API_KEY is read from a .env file inside the plugin directory — more reliable than shell env on macOS, since GUI apps don't inherit shell config. The .env file is in .gitignore; never commit it.

Default save folder is the user's ~/Pictures/gpt-image2/. Override per call with save_to=<absolute path>, or edit DEFAULT_SAVE_DIR in server.py.

Prerequisites:

  • Python 3.10+ (brew install python@3.12 on macOS)
  • An OpenAI API key with billing enabled
  • For gpt-image-2: organization verification at https://platform.openai.com/settings/organization/general (Persona ID + selfie, ~5 min to submit, up to 1–2 days to propagate). Without verification you can still use gpt-image-1 by passing model="gpt-image-1".

Tools (MCP API)

gpt_image_generate(
    prompt: str,
    size: str = "1024x1024",
    quality: str = "auto",
    save_to: str | None = None,
    model: str = "gpt-image-2",
) -> str   # absolute path to saved PNG, or "[Error: ...]"

gpt_image_edit(
    prompt: str,
    image_paths: list[str],         # 1–5 references; multi-image = char-lock attempt
    mask_path: str | None = None,   # transparent areas get edited
    size: str = "1024x1024",
    quality: str = "auto",
    save_to: str | None = None,
    model: str = "gpt-image-2",
) -> str
Param Common values Notes
size 1024x1024 square / 1024x1536 portrait / 1536x1024 landscape gpt-image-2 supports custom multiples of 16 up to 3840 px, max aspect 3:1
quality low ($0.006) / medium ($0.04) / high (~$0.165) / auto auto lets the model pick from the prompt complexity
model gpt-image-2 / gpt-image-1 Pass gpt-image-1 if your org isn't verified yet

Slash skills /gpt-image and /gpt-image-edit ship with a Chinese / English keyword router so phrases like "高品質" or "/gpt-image-edit" map cleanly to the right tool and parameters.

Demo

Demo coming soon — see RECORD_DEMO.md for the recording script.

Why this exists

I'm a non-developer building IG content and brand decks daily. Every image-generation tool I tried was great until it had to render Chinese text on a poster — at which point the output became unusable. gpt-image-2 is the first model I've used that gets text right on the first try, and I wanted it inside Claude with the rest of my creative ops, not in another tab. The .env-loading and auto-resize bits are the kind of thing you only notice after a week of failed launches; this plugin folds that pain in once.

Pricing reference

Quality 1024×1024 Notes
low ~$0.006 Drafts, fast iteration
medium ~$0.04 Default-quality production
high ~$0.165 Best detail / text fidelity
auto varies Model picks based on prompt complexity

Edit operations add input image tokens (~$0.02–0.05 extra at low quality).

Known caveats

  • First-call delay: ~30 s for venv bootstrap; instant afterwards.
  • MCP 60 s timeout: high-quality generations can exceed 60 s. The image still saves to disk — check the default folder or your save_to path.
  • Multi-image character lock: gpt-image-1 multi-image is weak fusion only (good for style cues, not strict identity preservation). gpt-image-2 (post-verification) is meaningfully better. For true character lock, dedicated tools (Dreamina, Midjourney character ref, Flux PuLID, custom LoRAs) still outperform.
  • Content policy: OpenAI rejects prompts featuring real celebrities, copyrighted characters, explicit content, or violence.
  • Org verification gate: gpt-image-2 requires verification; until then, fallback to gpt-image-1 via model="gpt-image-1".

Contributing

See CONTRIBUTING.md. Issues, PRs, and feature suggestions are welcome — this is a personal project shared as-is, maintenance is best-effort.

License

MIT — see LICENSE.

Author

  • Built by Icy — Impossible Studio TW / Taipei, Taiwan
  • GitHub: @Impossible-Studio-TW
  • X / Threads / IG: TBD

PRs welcome — especially for additional model fallbacks, new quality heuristics, or better keyword routing in the slash skills.

About

OpenAI gpt-image-2 / gpt-image-1 in Claude. Multi-image reference + Chinese text rendering. Auto-resizes oversize inputs.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors