[P0] parse_prd/expand_tasks fail (exit 1, empty stderr) under claude-code provider inside a nested Claude Code/MCP session — add nested-context preflight + provider re-route

## Problem

When the Atlas engine runs `parse_prd` / `expand_tasks` from **inside an existing Claude Code or MCP session** (the normal autonomous-build path), they fail with **exit 1 and empty stderr** — no tasks are written and the error is opaque. The research-role path (perplexity) still works, which is why the failure looks selective and confusing.

Root cause (confirmed): the upstream `task-master-ai` CLI's `claude-code` provider spawns `claude -p "<prompt>"` as a child process to borrow the authenticated session. Inside an already-nested Claude Code session that recursive headless spawn is hard-refused ("Claude Code cannot be launched inside another Claude Code session"), so the child dies with exit 1 / empty stderr. This plugin cannot fix the upstream spawn — but it **owns the preflight/validation/skill guards** and currently has **none**: a repo-wide grep for `CLAUDECODE` / `CLAUDE_CODE_ENTRYPOINT` / `claude -p` / `nested` / `headless` returns zero matches in source. Worse, the plugin actively steers users toward `claude-code` via `--claude-code` fix hints. This is P0 because it silently breaks the core parse→expand pipeline in exactly the headless/MCP context Atlas is designed to run in.

## Current Behavior

Verified against the source at `/home/anombyte/Shade_Gen/Projects/prd-taskmaster-plugin`:

- **No nested-context detection anywhere.** `mcp-server/taskmaster.py:_build_env` (lines 11-15) is the only place the subprocess env is assembled — it does `env = os.environ.copy(); env.pop("TASK_MASTER_PROJECT_ROOT", None)` and never inspects `CLAUDECODE` / `CLAUDE_CODE_ENTRYPOINT` or the configured provider.
- **`validate_setup` only checks that a model is *configured*, not *runnable*.** In `mcp-server/capabilities.py:validate_setup` (lines 298-450) Check 5 sets `provider_ok = bool(main_model)` — it never checks whether the provider can host a child spawn in the current context, and never key-checks `ANTHROPIC_API_KEY` / `PERPLEXITY_API_KEY`. Its fix hints actively recommend `task-master models --set-main sonnet --claude-code` and `--set-research opus --claude-code`.
- **The parse/expand skill steps have no spawn-failure branch.** `skills/generate/SKILL.md` Step 4 (parse, lines ~189-206) routes through the `main` role (the `claude-code` child spawn) via `mcp__task-master-ai__parse_prd` / CLI `task-master parse-prd`; Step 6 (expand, lines ~232-278) shells `task-master expand --all`. The "Patience under slow providers" block (lines ~270-278) anticipates `claude-code` as the provider but only tells the agent to *wait* — there is **no branch for the spawn failing outright**. The listed "MCP fallback" `mcp__plugin_atlas-go_go__tm_parse_prd` points at a tool that does not exist (see below).
- **`server.py` does not implement the parse/expand tools.** `mcp-server/server.py` (lines 30-224) registers an 18-tool surface (`preflight`, `validate_setup`, `init_taskmaster`, `calc_tasks`, …) and does **not** define `parse_prd`, `expand_tasks`, `tm_parse_prd`, `tm_analyze_complexity`, or `tm_parallel_expand` — confirming the spawn is upstream and the SKILL.md `tm_parse_prd` fallback is dead.
- **Stale-tag pollution risk (secondary).** `mcp-server/pipeline.py:preflight` (lines 118-234) reads `currentTag` from `.taskmaster/state.json` and counts tasks per tag, but the `parse_prd` recommendation only fires when `tasks_count == 0` (line ~209). A stale-but-nonempty `master` tag full of prior `done` tasks does not recommend a fresh tag — `parse_prd` would append into the polluted tag.

> **VERSION-SKEW WARNING (read before coding):** the live tools `mcp__atlas-engine__parse_prd` / `expand_tasks` / `tm_parallel_expand` / `engine_preflight` are exposed by a **newer/divergent engine build** that is NOT the source in this repo (this repo's `server.py` lacks them). Confirm which `server.py` backs the live `mcp__atlas-engine__*` tools before landing any *spawn-routing* change. The **setup / validate / skill-level guards in this task land in this repo**; the model-spawn routing (if changed) may need to land in the newer build.

## Expected Behavior

When `parse_prd` / `expand_tasks` are invoked from inside a nested Claude Code / MCP session:

1. The engine **detects the nested context** (`CLAUDECODE` set or `CLAUDE_CODE_ENTRYPOINT == "cli"`) before any model-spawning call.
2. If the configured `main`/`fallback` provider is `claude-code` and `ANTHROPIC_API_KEY` is absent, the engine does **one** of: (a) re-route `main`+`fallback` to a key-based / HTTP provider (anthropic if keyed, else the already-keyed perplexity), or (b) surface a **clear, actionable error** instead of the opaque exit-1/empty-stderr — never silently produce no tasks.
3. `validate_setup` reports the configured provider as **not runnable in this context** (downgraded readiness) rather than passing on `bool(main_model)`, and its fix hint stops recommending bare `--claude-code` in a nested context.
4. `parse_prd` then **succeeds and writes tasks to a fresh tag** when invoked from inside a Claude Code / MCP session.

## Files to Touch

- `mcp-server/capabilities.py` — in `validate_setup` (298-450): add a "provider runnable in this context" check. Detect `CLAUDECODE`/`CLAUDE_CODE_ENTRYPOINT` + `models.main.provider == "claude-code"` + missing `ANTHROPIC_API_KEY` → mark `provider_main` failed (or downgraded), set `ready=false`, and replace the `--claude-code` fix hints (lines ~412, ~425) with a key-based re-route command.
- `skills/setup/SKILL.md` — Step 3 provider config (98-128): in a headless/nested/MCP context, prefer a key-based provider (anthropic if `ANTHROPIC_API_KEY`, else perplipxity) for `main`+`fallback`; stop offering `--claude-code` (line ~125) as the default in that context.
- `skills/generate/SKILL.md` — Step 4 parse (189-206) and Step 6 expand (232-278): add a nested-session + `claude-code` preflight before the spawn, and an explicit error-path that converts exit-1/empty-stderr into an actionable message or a provider re-route. Remove/repair the dead `mcp__plugin_atlas-go_go__tm_parse_prd` fallback ref (line ~201).
- `mcp-server/taskmaster.py` — `_build_env` (11-15): the single subprocess-env chokepoint; centralize nested-context detection + provider/env adjustment here if a guarded Python wrapper is added.
- `mcp-server/pipeline.py` — `preflight` / tag accounting (118-234, esp. `recommended_action` / `recommended_tag` at ~203-233): detect a stale-but-nonempty current tag and recommend a fresh tag so `parse_prd` does not append into a polluted `master`.
- `mcp-server/server.py` (30-224) — only if the chosen fix is a guarded Python `parse_prd` wrapper; that is where it would be registered. (Confirm version-skew first.)

## Researched Fix Approaches

### 1. [Recommended] — Nested-context preflight guard + re-route main/fallback to a key-based provider (confidence: 88%)
- **Library/Config:** `task-master-ai` CLI `task-master models --set-main=<modelId>` / `--set-fallback=<modelId>` (provider INFERRED from modelId); config key `.taskmaster/config.json → models.{main,fallback}.{provider,modelId}`. Env markers `CLAUDECODE`, `CLAUDE_CODE_ENTRYPOINT`.
- **Pattern:** In `validate_setup` (capabilities.py) and the setup/generate skills, add a "provider runnable in this context" check. If nested AND `main.provider == "claude-code"` AND no `ANTHROPIC_API_KEY` → the child spawn WILL fail; downgrade readiness and re-route `main`+`fallback` to anthropic (if keyed) else perplexity (already keyed). Keep `research=perplexity`.
- **Why:** This is the only path that works headless/nested — perplexity/anthropic-API are plain HTTP (no `claude -p` child). Matches the run evidence exactly (exit 1, empty stderr, `ANTHROPIC_API_KEY` unset, perplexity research unaffected). The fixable surface is this plugin's preflight/validation, not the upstream spawn.
- **Risk:** perplexity `sonar` models are research-tuned — main-role PRD-structuring quality may drop vs sonnet. `modelId` strings drift across TM versions (`sonar-pro` vs `sonar-reasoning-pro`) — pin against the installed `task-master models --list`. Confirm WHICH `server.py` backs the live `mcp__atlas-engine__*` before landing spawn-routing there.
- **Implementation hint:**
```python
# detection (e.g. in capabilities.py / taskmaster._build_env)
nested = bool(os.environ.get("CLAUDECODE")) or os.environ.get("CLAUDE_CODE_ENTRYPOINT") == "cli"
main_provider = cfg.get("models", {}).get("main", {}).get("provider")
if nested and main_provider == "claude-code" and not os.environ.get("ANTHROPIC_API_KEY"):
    # re-route (non-interactive, provider inferred from modelId)
    if os.environ.get("ANTHROPIC_API_KEY"):
        # anthropic API path — no spawn
        cmd = ["task-master", "models", "--set-main", "claude-3-5-sonnet-20241022",
               "--set-fallback", "claude-3-haiku-20240307"]
    else:
        cmd = ["task-master", "models", "--set-main", "sonar-pro", "--set-fallback", "sonar"]
    # DO NOT pass --claude-code in a nested context
```

### 2. [Alternative] — Provide ANTHROPIC_API_KEY so task-master uses the Anthropic HTTP API instead of the spawn (confidence: 72%)
- **Library/Config:** `ANTHROPIC_API_KEY` env var; per upstream TM issue #1256, when `ANTHROPIC_API_KEY` is present TM uses the Anthropic API and **bypasses the claude-code CLI spawn**. Centralize in `taskmaster.py:_build_env`.
- **Pattern:** Keep `main` as anthropic (or claude-code), but guarantee `ANTHROPIC_API_KEY` is in the subprocess env. The "bug" (#1256: key takes precedence over claude-code config) is the cure in a nested context — HTTP call, no recursive `claude -p`.
- **Why:** Lowest-code-change unblock *when an Anthropic key is available*; eliminates the failing spawn without rewriting config.
- **Risk:** In the observed run `ANTHROPIC_API_KEY` was UNSET, so this only helps if a key can be supplied — otherwise fall to Approach 1's perplexity re-route. Incurs Anthropic API cost (vs free subscription). Behavior confirmed only via issue thread, not a versioned doc — version-sensitive.
- **Implementation hint:**
```python
# fix message when nested + claude-code + no key:
# "export ANTHROPIC_API_KEY=... (Anthropic API used instead of the claude-code spawn), or re-route main to perplexity"
# to set explicitly with key in env:
#   task-master models --set-main claude-3-5-sonnet-20241022
```

### 3. [Fallback] — Strip CLAUDECODE/CLAUDE_CODE_ENTRYPOINT from the spawn env (confidence: 28%)
- **Library/Config:** env override in `taskmaster.py:_build_env`; `CLAUDECODE`, `CLAUDE_CODE_ENTRYPOINT`.
- **Pattern:** Before spawning task-master, `env.pop("CLAUDECODE", None); env.pop("CLAUDE_CODE_ENTRYPOINT", None)` so the child `claude -p` no longer self-detects a parent session (mirrors the claude-agent-sdk `env={'CLAUDECODE':''}` workaround).
- **Why:** Most direct attempt to keep the free claude-code subscription path working headless; env-inheritance mechanism is verified.
- **Risk:** STRONGLY DISCOURAGED as a sole fix. (a) With `CLAUDECODE` unset inside a session, `claude --print` can produce NO output (silent empty result) — reproducing the exact "no tasks written" symptom (claude-code#29543). (b) The bundled `cli.js` has ADDITIONAL nested detection (parent-process scan / lockfile) beyond the env var, so it can still hang or exit empty. (c) Nested sessions "share runtime resources and will crash all active sessions" per the CLI's own error — risking the PARENT session. Only use gated behind a fallback to Approach 1.
- **Implementation hint:**
```python
env = os.environ.copy()
env.pop("CLAUDECODE", None); env.pop("CLAUDE_CODE_ENTRYPOINT", None)
# subprocess.run(["task-master", ...], env=env)  # NEVER rely on this alone
```

## Reference

`task-master-ai` (claude-task-master by eyaltoledano) selects providers via `.taskmaster/config.json` → `models.{main,research,fallback}.{provider,modelId,maxTokens,temperature}`. Roles are set non-interactively with `task-master models --set-main=<modelId>` / `--set-research=<modelId>` / `--set-fallback=<modelId>`; provider is INFERRED from a known `modelId` or forced with flags (`--ollama`, `--openrouter`, `--codex-cli`, `--claude-code`). The `claude-code` provider is special: **no API key, no HTTP call** — it spawns the local `claude` CLI to borrow the Pro/Max session (`docs/examples/claude-code-usage.md`). That spawn is what breaks inside a nested session: the child inherits `CLAUDECODE=1` and the CLI hard-refuses ("Claude Code cannot be launched inside another Claude Code session … unset the CLAUDECODE environment variable"), producing exit 1 / empty stderr / no tasks. TM gives the Anthropic API path precedence when `ANTHROPIC_API_KEY` is set (issue #1256), so a present key sidesteps the spawn. The clean headless/MCP pattern: detect nested via `CLAUDECODE`/`CLAUDE_CODE_ENTRYPOINT` and prefer a key-based/HTTP provider for `main`+`fallback`, never `claude-code`. Sources: claude-task-master `docs/command-reference.md`, `docs/configuration.md`, `docs/examples/claude-code-usage.md`, issues #705/#1193/#1256; anthropics/claude-agent-sdk-python#573 (subprocess inherits `CLAUDECODE=1`, exact rejection, `env={'CLAUDECODE':''}` workaround); anthropics/claude-code#32618 (nesting env vars) and #29543 (`claude --print` empty output when `CLAUDECODE` unset — why the unset hack is unreliable).

## Acceptance Criteria

- [ ] **`parse_prd` succeeds and writes tasks to a fresh tag when invoked from inside a Claude Code / MCP session** (i.e. with `CLAUDECODE=1` in the environment): `.taskmaster/tasks/tasks.json` is created with the expected number of tasks under a non-polluted tag, and the call exits 0.
- [ ] `expand_tasks` / `task-master expand --all` likewise completes and lands subtasks (tasks.json mtime advances, subtasks present) when run nested.
- [ ] A new test asserts the nested-context detector returns `True` when `CLAUDECODE` is set OR `CLAUDE_CODE_ENTRYPOINT == "cli"`, and `False` when neither is set.
- [ ] `validate_setup`, run with `CLAUDECODE=1` set, `models.main.provider == "claude-code"`, and `ANTHROPIC_API_KEY` unset, returns `ready: false` with the `provider_main` (or a new provider-runnable) check failed, and the fix hint is a key-based re-route command — NOT bare `task-master models --set-main sonnet --claude-code`.
- [ ] When the nested guard re-routes, `.taskmaster/config.json` `models.main`/`models.fallback` are rewritten to a key-based/HTTP provider (anthropic if `ANTHROPIC_API_KEY` present, else perplexity), `models.research` stays perplexity, and the new modelIds exist in `task-master models --list`.
- [ ] If no re-route is possible (no usable key/provider), the parse/expand path surfaces an **actionable error message naming the nested-session + claude-code cause**, instead of exit-1/empty-stderr with no tasks.
- [ ] `pipeline.py:preflight` recommends a fresh tag (`recommended_action`/`recommended_tag`) when the current tag is non-empty but fully `done` (stale `master`), so `parse_prd` does not append into a polluted tag.
- [ ] The dead `mcp__plugin_atlas-go_go__tm_parse_prd` reference in `skills/generate/SKILL.md` is either removed or backed by an implemented tool in `server.py`.
- [ ] `grep -rE 'CLAUDECODE|CLAUDE_CODE_ENTRYPOINT' mcp-server/` returns at least one match in the guard implementation (it currently returns zero).

## Complexity: M

## Trust Level: HINT (not specification)
The researched approaches above are starting points. Before implementing:
1. Verify the library/config exists as stated (`task-master models --list`, read `.taskmaster/config.json`, confirm `--set-main`/`--set-fallback` flags).
2. Resolve the VERSION-SKEW first: confirm which `server.py` backs the live `mcp__atlas-engine__*` tools (this repo's `server.py` lacks `parse_prd`/`expand_tasks`). The setup/validate/skill guards land in THIS repo; spawn-routing may land in the newer build.
3. Try the recommended approach (perplexity/anthropic re-route) — if it works in 1-2 attempts, use it.
4. If it fails, do NOT keep retrying — research why (modelId drift, provider inference), explore the alternatives. Avoid Approach 3 as a sole fix (silent-empty output, risks the parent session).
5. The acceptance criteria are the real spec, not the approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[P0] parse_prd/expand_tasks fail (exit 1, empty stderr) under claude-code provider inside a nested Claude Code/MCP session — add nested-context preflight + provider re-route #11

Problem

Current Behavior

Expected Behavior

Files to Touch

Researched Fix Approaches

1. [Recommended] — Nested-context preflight guard + re-route main/fallback to a key-based provider (confidence: 88%)

2. [Alternative] — Provide ANTHROPIC_API_KEY so task-master uses the Anthropic HTTP API instead of the spawn (confidence: 72%)

3. [Fallback] — Strip CLAUDECODE/CLAUDE_CODE_ENTRYPOINT from the spawn env (confidence: 28%)

Reference

Acceptance Criteria

Complexity: M

Trust Level: HINT (not specification)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[P0] parse_prd/expand_tasks fail (exit 1, empty stderr) under claude-code provider inside a nested Claude Code/MCP session — add nested-context preflight + provider re-route #11

Description

Problem

Current Behavior

Expected Behavior

Files to Touch

Researched Fix Approaches

1. [Recommended] — Nested-context preflight guard + re-route main/fallback to a key-based provider (confidence: 88%)

2. [Alternative] — Provide ANTHROPIC_API_KEY so task-master uses the Anthropic HTTP API instead of the spawn (confidence: 72%)

3. [Fallback] — Strip CLAUDECODE/CLAUDE_CODE_ENTRYPOINT from the spawn env (confidence: 28%)

Reference

Acceptance Criteria

Complexity: M

Trust Level: HINT (not specification)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions