feat(mcp): SaaS tenant-identity spine — authenticated tenant, no header override, default-deny#66
Open
Basheirkh wants to merge 5 commits into
Open
feat(mcp): SaaS tenant-identity spine — authenticated tenant, no header override, default-deny#66Basheirkh wants to merge 5 commits into
Basheirkh wants to merge 5 commits into
Conversation
added 5 commits
June 27, 2026 11:52
…er override, default-deny Phase 1 of multi-tenant SaaS (PLAN-saas-multitenant): close the isolation hole where the tenant came from a FREE x-nil-workspace header (anyone could read any workspace). resolve_tenant gains a saas mode: the tenant is the AUTHENTICATED identity via an injected claim_resolver (verified JWT workspace claim); the workspace header may NOT override it; a BYO adapter-url is rejected (identity routes to the tenant's registered active adapter); a missing claim is default-deny. Header-trust remains only for self-hosted/dev (multi_tenant without saas). build_remote_app/TenantToolsProvider thread saas + claim_resolver (env NIL_MCP_SAAS); SaaS FAILS CLOSED without a claim_resolver — it can never silently fall back to header-trust. Isolation conformance tests: tenant A and B route to different adapters; a header naming B under an A token is refused; BYO adapter-url refused; missing claim default-denied; unknown workspace has no adapter. 25 mcp/tenant tests green. Remaining (follow-up): wire the production claim_resolver = keycloak JWKS JWT verifier; then tenant-scope the newer surfaces (intent router providers, export, automation, Hermes) + per-tenant quotas (Phases 2-3), and the encrypted per-tenant secret vault.
…quotas Builds on the identity spine (#66). Three self-contained, TDD'd kernel keystones for tenant management: - SecretVault (nilscript/secrets/vault.py): per-tenant secrets (adapter creds + LLM key) encrypted at rest (Fernet); read BY TENANT only; storage-agnostic (injectable store); wrong key can't decrypt; tenants isolated. This is "save your secrets once" done securely. - JWT claim verifier (nilscript/mcp/auth.py): the production claim_resolver for SaaS mode — verifies the bearer JWT (sig/exp/iss/aud) and reads the workspace claim; forged/expired/missing → None → default-deny. Completes the spine's prod wiring (keycloak JWKS layers on top). - Per-tenant quotas + rate limit (nilscript/governance_quota.py): token-bucket + daily volume caps per (tenant, kind); a noisy tenant is throttled without starving others (the 429 fairness lesson). Pure, injected clock, resume-safe. 32 tests green (vault roundtrip/encryption/isolation/wrong-key; JWT verify/forge/expiry/no-claim; rate-limit fairness; quota caps).
…ault
Onboarding for SaaS: a company is stood up in a single privileged call.
- store: tenant_secrets table + put/get/delete_secrets using the SecretVault (encrypted at rest,
Fernet, NIL_VAULT_KEY); vault disabled (fail-closed) when no key. Secrets keyed by workspace.
- app: POST /tenants/provision (workspace + secrets + adapter → store secrets encrypted, register +
activate adapter); GET /tenants/{ws}/secret/{name} (registry-token-gated server-to-server fetch of a
tenant's decrypted secret for the platform — never the browser, never logged).
Tests: one-call provision activates the adapter + stores secrets; secret read is token-gated; secrets
are ciphertext at rest (no plaintext key on disk); tenants isolated; auth + workspace required. 164
cp/registry/tenant/saas tests green.
…oped durable layer Closes the remaining SaaS items (kernel side): - JWKS claim resolver (mcp/auth.py): production keycloak path — PyJWKClient fetches + caches signing keys, selects by kid (rotation-safe); fail-closed on any verify error. jwt_claim_resolver_from_env precedence: NIL_JWT_JWKS_URL > NIL_JWT_PUBLIC_KEY > NIL_JWT_HS_SECRET. - Surface tenant-scoping: store.recent(workspace=) and store.pending(workspace=) (pending joins to its events' workspace, since approvals carries none); /api/events and /api/pending take ?workspace=. Conformance: tenant A's events/pending never include B's; operator view (no ws) sees all. - Tenant-scoped durable layer (durable.py, Temporal-ready): tenant-prefixed deterministic workflow ids (idempotent + no cross-tenant collision), per-tenant Temporal namespace, TenantDurablePolicy (per-tenant rate + concurrency admission — the 429 fairness, durable edition). Worker integration is the separate Temporal build; this is the isolation layer it plugs into. 42 SaaS tests green (JWKS verify/forge, durable id/namespace isolation + per-tenant throttle, events/ pending workspace scoping); 161 across cp/registry/tenant/mcp.
…layer (Phase 6) durable_temporal.py (optional — temporalio imported lazily): heavy/bulk governed writes run as DURABLE workflows, tenant-isolated and crash-safe: - per-tenant Temporal namespace + deterministic tenant-scoped workflow id (idempotent, no cross-tenant collision); - the NIL gate runs in an activity with a RetryPolicy → a throttled (429)/transient backend is retried durably, not dropped (the 429 fairness lesson, durable edition); - register_executor injects the NIL propose→commit (real SDK in prod, a fake in tests); - run_worker starts a worker on the tenant's namespace + task queue. Verified END-TO-END with temporalio's in-process time-skipping server (no external infra): a backend throttled twice is retried and commits on attempt 3; workflow id is tenant-scoped + idempotent. pyproject: [saas] (pyjwt[crypto] + cryptography) and [temporal] (temporalio) extras, wired into [dev]. 148 durable/saas/tenant/cp/mcp tests green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 1 of multi-tenant SaaS (
PLAN-saas-multitenant): close the isolation hole where a request's tenant came from the freex-nil-workspaceheader — anyone could set it and read any workspace's data via registry routing.resolve_tenantgains a SaaS mode:claim_resolver(verified JWTworkspaceclaim)x-nil-workspaceheader cannot override it (mismatch → refused)x-nil-adapter-urlis rejected (identity routes to the tenant's registered active adapter)Header-trust remains only for self-hosted/dev (
multi_tenantwithoutsaas).build_remote_app/TenantToolsProviderthreadsaas+claim_resolver(envNIL_MCP_SAAS); SaaS fails closed without a resolver — it can never silently fall back to header-trust.Isolation conformance (the point)
25 mcp/tenant tests green; 105 green across mcp/tenant/intent/router.
Scope / honesty
This is the identity spine only (the decision we agreed: build this first). The production
claim_resolver(keycloak JWKS JWT verifier) is the deployment wiring — intentionally not hand-rolled here. Remaining for full SaaS: tenant-scope the newer surfaces (intent router providers, export, automation, Hermes isolation), per-tenant quotas/rate-limits, and the encrypted per-tenant secret vault (decided: control-plane vault). Not merged = no deploy.