Skip to content

feat: Kubernetes deployment (kustomize + sealed secrets)#69

Merged
jcschaff merged 5 commits into
mainfrom
feat/kubernetes-kustomize-deployment
Jul 1, 2026
Merged

feat: Kubernetes deployment (kustomize + sealed secrets)#69
jcschaff merged 5 commits into
mainfrom
feat/kubernetes-kustomize-deployment

Conversation

@jcschaff

@jcschaff jcschaff commented Jul 1, 2026

Copy link
Copy Markdown
Member

Summary

Adds a Kubernetes deployment for VCell-AI under kustomize/, modeled on ../sms-api/kustomize. Deploys the three docker-compose.yml services — qdrant, backend (FastAPI :8000), frontend (Next.js :3000) — with a base → config → overlays structure and sealed secrets.

Layout

kustomize/
├── base/            qdrant StatefulSet(+10Gi PVC), backend & frontend Deployments+Services
├── config/<env>/    non-secret config -> backend-config / frontend-config ConfigMaps
├── overlays/<env>/  namespace, image tags, ingress, secrets.sh, secrets.dat.template
└── scripts/         sealed_secret_{backend,frontend,ghcr}.sh + build_and_push.sh

Environments: vcell-ai-rke (prod, on-prem UCHC RKE), vcell-ai-rke-dev (dev), vcell-ai-local (minikube).

Images: ghcr.io/virtualcell/vcell-ai-backend, ghcr.io/virtualcell/vcell-ai-frontend (pulled via a sealed ghcr-secret).

Sealed secrets

Each overlay has a master secrets.sh that reads plaintext secrets.dat (created from the committed secrets.dat.template) and emits three SealedSecret manifests:

Sealed secret Keys Script
backend-secrets AZURE_API_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_PUBLIC_KEY, SUPABASE_SERVICE_ROLE_KEY sealed_secret_backend.sh
frontend-secrets AUTH0_SECRET, AUTH0_CLIENT_SECRET sealed_secret_frontend.sh
ghcr-secret ghcr.io image-pull .dockerconfigjson sealed_secret_ghcr.sh

Only genuinely sensitive values are sealed; everything else (Azure endpoint/deployment names, Qdrant URL, Auth0 domain/audience, Langfuse host, hostnames) lives in the committed config/<env>/*.env ConfigMaps. secrets.dat and the generated secret-*.yaml are gitignored.

Ingress

Frontend at /, backend proxied at /api/* with the /api prefix stripped via rewrite-target (FastAPI routes live at the root, e.g. /biomodel, /kb, /query). Prod/dev use nginx + cert-manager (letsencrypt-prod) on vcell-ai.cam.uchc.edu / vcell-ai-dev.cam.uchc.edu; local uses plain HTTP on vcell-ai.local (minikube).

Validation

  • kubectl kustomize renders cleanly for all three overlays (13 objects each); ConfigMap name-hashes correctly propagate into the deployments' envFrom.
  • All shell scripts pass bash -n.
  • Confirmed .gitignore excludes secrets.dat and generated secret-*.yaml (no secret material committed).

Deploy (per environment)

cd kustomize/overlays/vcell-ai-rke
kubectl create namespace vcell-ai-rke
cp secrets.dat.template secrets.dat && $EDITOR secrets.dat   # gitignored
./secrets.sh                                                 # needs kubeseal + controller
kubectl apply -k .

Notes for reviewers

  • Placeholder values in config/<env>/*.env (Azure endpoint, Auth0 domain/client-id, Supabase URL) must be filled in per deployment.
  • NEXT_PUBLIC_* are inlined at Next.js build time — NEXT_PUBLIC_API_URL must be set when building the frontend image, not just at runtime.
  • Sealed secret-*.yaml are gitignored (cluster-specific; can't be generated without the target cluster's key), so an overlay only renders fully after secrets.sh is run. This mirrors the required-but-absent secrets.dat contract.

🤖 Generated with Claude Code

https://claude.ai/code/session_01XpNzobVi83p3YKL9sqtGwZ


Follow-ups

jcschaff and others added 3 commits July 1, 2026 12:13
Deploy the three docker-compose services (qdrant, backend, frontend) to
Kubernetes, modeled on ../sms-api/kustomize.

Structure:
- base/        StatefulSet+Service (qdrant) and Deployments+Services
               (backend :8000, frontend :3000), wired to config/ ConfigMaps
               and sealed secrets
- config/<env> non-secret config -> backend-config/frontend-config ConfigMaps
- overlays/<env> namespace, image tags, ingress, and the sealed-secret tooling
               for vcell-ai-rke (prod), vcell-ai-rke-dev (dev), vcell-ai-local
               (minikube)
- scripts/     sealed_secret_{backend,frontend,ghcr}.sh + build_and_push.sh

Sealed secrets: each overlay has a master secrets.sh that reads plaintext
secrets.dat (from secrets.dat.template) and emits three SealedSecret manifests
(backend-secrets, frontend-secrets, ghcr image-pull). secrets.dat and the
generated secret-*.yaml are gitignored.

Ingress serves frontend at / and proxies /api/* to the backend (prefix stripped
via rewrite-target, since FastAPI routes are at the root). Validated with
`kubectl kustomize` for all three overlays (13 objects each).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XpNzobVi83p3YKL9sqtGwZ
Clarify in the kustomize README why frontend-secrets (AUTH0_SECRET,
AUTH0_CLIENT_SECRET) exist for a "frontend": the Next.js image is a trusted
Node server (BFF, Authorization Code + PKCE server-side) plus an untrusted
browser bundle that never receives the secrets. Note what would change if the
frontend became a public SPA client.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XpNzobVi83p3YKL9sqtGwZ
- CI publishes version tags only (no :latest); pin all three overlays to the
  current published image tag 0.1.6.2 so pods can actually pull.
- Document that NEXT_PUBLIC_API_URL is baked into the frontend image at build
  time (CI hardcodes a NodePort URL), so the runtime config/*.env value is
  ignored — with two options to make the image environment-portable. Left as a
  follow-up since it touches build_containers.yml and the frontend Dockerfile.

Verified: server-side dry-run of the dev overlay against the live cluster
validates all 13 objects; sealed-secret tooling (secrets.sh + kubeseal) works
end-to-end.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XpNzobVi83p3YKL9sqtGwZ
jcschaff and others added 2 commits July 1, 2026 13:15
The vcell-fluxcd ingresses (and the cluster's letsencrypt-prod ClusterIssuer,
which is HTTP-01) issue one cert per host from a single Ingress. Our host is
split across two Ingress objects (frontend + a backend one that only exists for
the /api rewrite, since rewrite-target is ingress-wide). Both carried the
cert-manager cluster-issuer annotation, which is redundant. Keep the issuer
annotation on frontend-ingress only; backend-ingress still serves TLS via the
same shared secret. Verified on the cluster: backend-ingress issuer -> none,
one Certificate remains.

Note: the dev cert stays pending until DNS for vcell-ai-dev.cam.uchc.edu
resolves to the ingress (HTTP-01 self-check currently fails with "no such host")
— independent of this change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XpNzobVi83p3YKL9sqtGwZ
Point the dev (vcell-ai-rke-dev) ingress at the letsencrypt-staging
ClusterIssuer and a letsencrypt-staging-vcell-ai-dev-tls secret, so cert
issuance during DNS/testing doesn't consume Let's Encrypt production rate
limits. Prod overlay stays on letsencrypt-prod. Verified on-cluster: the ACME
challenge now targets acme-staging-v02.api.letsencrypt.org.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XpNzobVi83p3YKL9sqtGwZ
@jcschaff jcschaff merged commit 9b73ab4 into main Jul 1, 2026
1 check passed
@jcschaff jcschaff deleted the feat/kubernetes-kustomize-deployment branch July 1, 2026 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant