A Stable Diffusion WebUI extension that builds prompts using a local LLM. Works with Forge Neo, Forge, and AUTOMATIC1111.
Takes your short prompt and expands it into a detailed description, booru-style tags, or a hybrid of both using a locally-running language model. No cloud APIs, no data leaves your machine.
- Local LLM powered β uses Ollama or any OpenAI-compatible API
- Four generation modes β Prose (flowing paragraph), Hybrid (tags + NL supplement), Tags (booru tags), Remix (modify existing)
- Mode modifiers β Still (frozen moment), Scene (action over time), Audio (sound cues) β available in the Mode dropdown alongside other modifiers
- 130+ categorized modifiers β organized into auto-generated dropdowns: Subject, Setting, Lighting & Mood, Visual Style, Camera, Audio
- Tag generation & validation β Illustrious, NoobAI, Pony, and Anima (retrieval-augmented) formats with auto-downloaded danbooru databases, alias correction, fuzzy matching, deduplication, and standard tag ordering
- Tag post-processing β strips LLM meta-annotations, converts hyphens, escapes parentheses for SD, prefix-matches danbooru disambiguation suffixes
- Wildcards β creative LLM choices: surprise location, random artist, anime era, narrative detail, and more
- Inline wildcards β
{name?}placeholders in your prompt - Local overrides β extend with your own YAML files; each file becomes a dropdown
- Extensible tag formats β add new model support by dropping a YAML file
- Detail slider β scales output length to the active image model (SD/SDXL/Flux/Z-Image)
- Streaming β real-time token streaming with stall detection, thinking detection, and configurable safeguards
- Cancel button β abort any running generation
- Ollama status β shows version, loaded model, and GPU/CPU mode
- Metadata β all settings saved to generated images and restored when loading
- Works in both txt2img and img2img tabs
You need Ollama running locally:
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull the recommended model
ollama pull huihui_ai/qwen3.5-abliterated:9b
# Start Ollama (CPU-only, no VRAM used)
OLLAMA_NUM_GPU=0 OLLAMA_KEEP_ALIVE=0 ollama servehuihui_ai/qwen3.5-abliterated:9b (~6 GB) β best balance of quality and instruction following. This is the default.
The 4b variant is not recommended as it produces noticeably lower quality output. Larger models (14b+) work well if you have the RAM but are slower on CPU.
"Abliterated" models have refusal behaviors removed, which is useful for unrestricted creative content. Standard models work fine for general use.
Known limitation: Certain combinations of source prompt, modifiers, and wildcards can cause Qwen to enter a repetition loop, generating garbage until the token limit is reached. This shows as a "Truncated" status. If this happens, try removing or changing a modifier β some combinations are simply too complex for a 9B model to synthesize coherently. Larger models handle complex combinations better.
- Open Forge/A1111 WebUI
- Go to Extensions > Install from URL
- Paste:
https://github.com/Gunther-Schulz/sd-webui-prompt-enhancer.git - Click Install and restart the WebUI
cd stable-diffusion-webui/extensions
git clone https://github.com/Gunther-Schulz/sd-webui-prompt-enhancer.gitRestart the WebUI.
- Open the Prompt Enhancer accordion in the txt2img or img2img tab
- Type your prompt in the Source Prompt box
- Optionally select modifiers from the categorized dropdowns (Mode, Subject, Setting, Lighting & Mood, etc.)
- Optionally select Wildcards for creative LLM choices
- Choose a generation mode:
Prose β Click β Prose for a flowing paragraph. Best for Flux, SD3, and other natural-language models.
Hybrid β Click β¨ Hybrid for danbooru-style tags followed by a short NL description. Three-pass pipeline: (1) generates rich prose with wildcards/modifiers, (2) extracts tags from that prose using the selected tag format, (3) summarizes the prose into 1-2 compositional sentences. Best for Illustrious, NoobAI, and Pony β follows the community-recommended "tags + NL" format where tags provide precise control and the NL supplement captures spatial relationships, lighting, and mood that tags alone cannot express.
Tags β Click π· Tags for pure booru-style tags. Two-pass pipeline: (1) generates rich prose (same as Prose mode, with wildcards/modifiers), (2) extracts tags from that prose using the selected tag format. The prose pass gives the LLM room to reason about the scene before compressing it to tags β producing richer, more coherent tag lists than asking for tags directly. Tags are post-processed through validation, correction, reordering, and paren escaping.
Both Hybrid and Tags modes use the tag post-processing pipeline:
- Select a Tag Format β Illustrious, NoobAI, Pony, or Anima (recommended)
- Choose a Tag Validation mode (RAG recommended with Anima)
Tag databases are automatically downloaded on first use (~2-3 MB per format; ~1.1 GB for Anima's FAISS index). Tags are validated, corrected (aliases, common mistakes like 1man β 1boy), deduplicated, and reordered into standard danbooru convention. Parentheses in disambiguation suffixes (e.g., artist_(style)) are automatically escaped for SD.
Validation modes:
- RAG β retrieval + embedding validator (Anima format only) β shortlist of real artists/characters/series injected into system prompt + every drafted tag checked against FAISS index. Default.
- Fuzzy Strict β alias + fuzzy matching, drop unrecognized
- Fuzzy β alias + fuzzy string matching, keep unrecognized
- Off β raw LLM output, no validation
On truncation (Ollama hit the token or time budget), tag-mode outputs fail loud: empty textbox, red "Truncated β no output (retry)" status. A reduced partial result would look like success but silently missing content; the retry path is more honest.
Already have an enhanced prompt and want to tweak it?
- Select new modifiers or wildcards, or update the source prompt
- Click π Remix instead of Prose
- The LLM reads the current prompt from the main textarea and integrates the changes without rewriting everything
- Remix auto-detects whether the existing prompt is prose, tags, or hybrid format and applies the appropriate system prompt and post-processing
Random modifiers come in three flavors. The dropdown label shows which flavor via the badge after the name:
| label | guarantee | when it decides | how it decides |
|---|---|---|---|
π² Random X |
LLM-driven | during generation | the LLM picks; relies on model's creativity; collapses onto priors for small models (e.g. qwen3 picks 1920s for Random Era 81% of the time) |
π² Random X β |
pre-picked from DB | before generation | seed-picked from a Danbooru pattern-filtered pool; LLM just renders the chosen value |
π² Random X β |
post-filled if dropped | after generation | LLM generates freely; if the expected category is missing from the output, scene-aware retrieval (FAISS on the generated prose) + seed pick injects a real tag |
π² Random X ββ |
both | pre-pick and safety-net | strongest β pre-picked value steers prose, and if LLM ignores the directive, post-fill still injects the category |
Mnemonic: filled diamond β = committed upfront (pre-pick). Hollow diamond β = fills a gap after (post-fill).
Two randomization mechanisms side-by-side:
-
source:(pre-pick, β) β resolves before the LLM runs. The wildcard turns into a concrete Danbooru tag via regex against the DB (^\d{4}s_\(style\)$for Random Era,_(flower)$for Random Flower, etc.). The picked value is injected into the system prompt as if you had selected it directly. The LLM's job is to render the chosen thing, not to choose. -
target_slot:(post-fill, β) β runs after the LLM's output. Checks whether the expected Danbooru category (artist, copyright, character) was covered by the LLM. If missing,_retrieve_prose_slotdoes a FAISS search of the generated prose against that DB category, reranks to top-10, and seed-picks one to inject as a tag. Deterministic for a given prose + seed.
Data-driven randoms currently shipped:
| modifier | mechanism | DB pool | dropdown |
|---|---|---|---|
| Random Era | source: |
decade styles (^\d{4}s_\(style\)$) β 7 |
Setting |
| Random Flower | source: |
_(flower)$ β 73 |
Setting |
| Random Food | source: |
_(food)$ β 79 |
Setting |
| Random Animal | source: |
_(animal)$ β 31 |
Setting |
| Random Constellation | source: |
_(constellation)$ β 52 |
Setting |
| Random Tarot Card | source: |
_(tarot)$ β 23 |
Visual Style |
| Random Symbol | source: |
_(symbol)$ β 38 |
Visual Style |
| Random Artist | target_slot: artist |
category=1 (Danbooru artists) | Visual Style |
| Random Franchise | target_slot: copyright |
category=3 | Visual Style |
Other π² modifiers are LLM-driven β they work, but can exhibit collapse bias toward training-data priors.
Adding your own data-driven random in a local override:
# PROMPT_ENHANCER_LOCAL/my.yaml
decor:
π² Random Gem:
behavioral: ""
keywords: ""
source:
db_pattern: "_\\(gemstone\\)$" # any regex against category=general tags
min_post_count: 50 # popularity floor; blocks niche/junk
template: "Include {display} as a prop or ornament in the scene."The UI auto-detects the source: key and appends the β badge. No Python changes needed.
Console trace: when a source:-driven random fires, the Forge console logs [PromptEnhancer] Random pick (Random Era): 1970s_(style) (pool=7, post_count=992) so you can see what was chosen and how big the pool was.
Use {name?} placeholders in your source prompt for the LLM to fill creatively:
a woman sitting in a {location?} wearing {outfit?} during {time?}
Click β Cancel to abort any running generation. Works reliably across multiple clicks.
| Setting | Default | Description |
|---|---|---|
| Mode | (none) | Dropdown: Still (frozen moment), Scene (action over time), Audio (sound cues) |
| Base | Default | System prompt template (Default or Custom) |
| Tag Format | Illustrious | Tag output format: Illustrious, NoobAI, Pony, Anima (for Tags and Hybrid buttons) |
| Tag Validation | RAG | How to validate generated tags: RAG (Anima only), Fuzzy Strict, Fuzzy, Off |
| Modifiers | (none) | Multiple categorized dropdowns auto-generated from YAML files |
| Wildcards | (none) | Creative delegation β let the LLM make choices |
| Detail | 0 (auto) | Output length: 0=auto, 1=minimal ... 10=extensive, scales to model |
| Temperature | 0.8 | Creativity (0 = deterministic, 2 = creative) |
| Think | off | Let model reason before answering (slower) |
| Seed | -1 (random) | LLM seed for reproducibility. Fixed seed = same output for same input |
| API URL | http://localhost:11434 |
Ollama API endpoint |
| Model | huihui_ai/qwen3.5-abliterated:9b |
LLM model (auto-detected from Ollama) |
| Variable | Default | Description |
|---|---|---|
PROMPT_ENHANCER_LOCAL |
(none) | Comma-separated directories for local modifier overrides |
PROMPT_ENHANCER_STALL_TIMEOUT |
10 | Abort if no tokens received for this many seconds |
PROMPT_ENHANCER_MAX_TOKENS |
1000 | Hard cap on output tokens |
PROMPT_ENHANCER_MAX_TIME |
60 | Hard cap on total generation time in seconds |
Modifiers are organized into YAML files in the modifiers/ directory. Each file becomes a dropdown in the UI:
| Dropdown | Categories |
|---|---|
| Mode | mode (Still, Scene, Audio) |
| Subject | genre, subject, activity, relationship |
| Setting | setting, time period, aesthetic |
| Lighting & Mood | lighting, mood, atmosphere, emotion |
| Visual Style | color, art style, anime (25+ sub-styles), cinema style, photography format, vintage format |
| Camera | perspective, distance, focus, technique, motion, material |
| Audio | ambient types, silence |
Tag format definitions live in tag-formats/ as YAML files. Each file defines:
system_prompt: |
(LLM instructions for generating tags)
use_underscores: true
tag_db: illustrious.csv
tag_db_url: https://...Add support for new models by dropping a YAML file β no code changes needed.
The Anima tag format uses a richer pipeline than the rapidfuzz-based validation of the other formats:
- Shortlist retrieval: before prose generation, real Danbooru
artists / characters / series that match the source prompt are
pre-retrieved and injected into the LLM's system prompt. Prevents
hallucinated names like
@takashi_murowoat the source. - Embedding-based validator: every LLM-drafted tag is checked
against a 273k-entry FAISS index using bge-m3 embeddings. Real
Danbooru tags pass through; phrase-shape hallucinations (
4k,detailed_background,animedia) are dropped. - Character β series pairing via pre-computed co-occurrence table
(e.g.
hatsune_mikuβvocaloidadded automatically). - Artist/character signatures: artist embeddings include their top co-occurring general tags from 500k real Danbooru posts, so the retriever can match on style/theme rather than just name.
Zero setup needed β the extension's install.py downloads ~1.1 GB of
pre-built index artefacts from HuggingFace on first load
(dataset).
The bge-m3 embedder and bge-reranker models auto-download via
sentence-transformers (~3.4 GB total) on first Anima click.
Settings live under Settings β Anima Tagger β threshold, reranker toggle, co-occurrence pairing toggle, query expansion toggle.
Scripts under src/anima_tagger/scripts/ are the maintainer workflow
for (re)building the artefacts when Danbooru data updates:
# 1. Pull latest Danbooru tag dump + post dataset from HF
python src/anima_tagger/scripts/download_data.py
# 2. Rebuild sqlite + FAISS index + co-occurrence (~10 min on GPU)
python src/anima_tagger/scripts/build_index.py
# 3. Upload fresh artefacts to HF (auth via `hf auth login` first)
python src/anima_tagger/scripts/package_artifacts.py
# 4. (Optional) verify retrieval quality
python src/anima_tagger/scripts/verify.py
python src/anima_tagger/scripts/full_pipeline_test.pyEnd users never run these.
Extend the extension with your own modifiers and base prompts. Each YAML file in a local directory becomes its own dropdown in the UI.
Set the PROMPT_ENHANCER_LOCAL environment variable to one or more comma-separated directories:
PROMPT_ENHANCER_LOCAL="/home/user/my-modifiers, /home/user/experimental"The Local Overrides field in the UI can refresh content of existing dropdowns. New files require a Forge restart to create new dropdowns.
Each .yaml file becomes a dropdown. The filename determines the dropdown label:
/home/user/my-modifiers/
_bases.yaml # extends the Base dropdown (underscore prefix = special)
nsfw.yaml # creates "Nsfw" dropdown
my-styles.yaml # creates "My Styles" dropdown
Files with the same name as published ones (e.g., subject.yaml) merge their content into the existing dropdown.
All files use the same two-level format β categories containing named keyword strings:
# my-styles.yaml β becomes "My Styles" dropdown
my category:
Cozy Autumn: autumn, warm tones, falling leaves, golden light, wood smoke
Rainy Tokyo: tokyo streets, neon reflections, rain, umbrellas, night_bases.yaml extends or replaces entries in the Base dropdown. _prompts.yaml overrides the operational prompts (Remix, summarize, wildcard preamble, negative contract). Both merge with the published defaults; anything you don't override stays.
# _bases.yaml
My Custom Base: |
You are a prompt writer. Given a user's raw input, expand it into a
detailed scene description...For every non-Custom base, the final system prompt is:
_preamble # shared: input handling
your base body # per-base: style and content rules
_format # shared: no headings, no line breaks, no commentary
[detail instruction] # added when Detail slider > 0
Then during generation these sections are appended in order: SOURCE PROMPT: ..., Apply these styles: ... for selected modifiers, wildcard_preamble plus each wildcard instruction, and (when the + Negative checkbox is on) the negative block with its POSITIVE:/NEGATIVE: contract.
Override _preamble or _format in _bases.yaml to change shared behavior for all bases. Select the Custom base to bypass the wrapping entirely and supply a raw system prompt from the UI.
LLMs heavily mirror the structure of their system prompt. If you describe the output shape using labeled bullets, the model will often echo those labels as section headers in its response:
# BAD β the model emits "Patched Prompt:" / "Creative Choice:" in the output
My Base: |
Output the patched prompt.
Instruction blocks below may include:
- Instruction: β free-form text...
- Creative choice blocks β optional...# GOOD β prose rules, no template structure, explicit anti-label coda
My Base: |
The text below contains free-form directives, style keywords, and
optional wildcard prompts. Apply directives literally, weave style
keywords naturally, and treat wildcards as optional.
Output only the updated prompt as raw text. No section headers, no
labels, no prefaces like "Patched:" or "Result:".Rules of thumb:
- Describe the LLM's job in imperative prose, not the input/output shape via labeled templates.
- Avoid echo-bait nouns in your system prompt (words like
patched,block,section). If the model is going to invent a header, it picks one it saw in your prompt. - When output structure matters, end with an explicit anti-label clause naming the concrete bad prefixes β the model avoids exactly what you tell it to avoid.
- Short imperative sentences beat essays. "Write short, direct sentences." is more reliable than a paragraph describing desired voice.
- Include concrete contrasts, not abstract rules.
"thin black leather choker with a small metal ring"not"elegant necklace"teaches the model more than "be specific". - Use the same vocabulary you want in the output. If you want
"floorboards creak", don't just say "add ambient sound" β show the tone. - Avoid dramatic or marketing language in the rules themselves β the model picks up tone from your examples.
- Turn on the Think checkbox to see the model's reasoning β useful for diagnosing why it picked a particular structure.
- Try empty-source + active wildcards, conflicting modifiers, and instruction-style source prompts ("make it darker") as edge cases.
- Watch for any words or phrases from your system prompt that appear verbatim in output β that's mirroring, and usually means you need to reframe that section as prose instead of labeled structure.
- Prose: source prompt is sent to the local LLM with system prompt (base + modifiers + detail level) and wildcards in the user message. The LLM returns a detailed flowing paragraph.
- Hybrid: three-pass pipeline. Pass 1 generates prose (same as Prose mode). Pass 2 extracts danbooru tags from that prose using the selected tag format's system prompt. Pass 3 summarizes the prose into 1-2 compositional sentences. Tags go through the full post-processing pipeline. All three passes use the same seed for consistency.
- Tags: two-pass pipeline. Pass 1 generates prose (same as Prose mode). Pass 2 extracts danbooru tags from that prose using the selected tag format's system prompt. Tags go through the full post-processing pipeline. Both passes use the same seed for consistency. On Anima + RAG, the same shortlist + embedding validator as Hybrid applies; on truncation, tag mode fails loud (empty output + retry status) rather than delivering a partial list that looks complete.
- Remix: the current prompt is sent back to the LLM with new modifiers/wildcards to integrate. Auto-detects prose, tags, or hybrid format and uses the appropriate refine prompt and post-processing.
- Streaming: all LLM calls use streaming with real-time stall detection and thinking mode detection.
/no_thinkis prepended to prevent Qwen3 models from entering thinking mode. - The output is written to the main prompt textbox and all settings are saved to image metadata for reproducibility.
MIT