feat: Kubernetes deployment (kustomize + sealed secrets)#69
Merged
Conversation
Deploy the three docker-compose services (qdrant, backend, frontend) to
Kubernetes, modeled on ../sms-api/kustomize.
Structure:
- base/ StatefulSet+Service (qdrant) and Deployments+Services
(backend :8000, frontend :3000), wired to config/ ConfigMaps
and sealed secrets
- config/<env> non-secret config -> backend-config/frontend-config ConfigMaps
- overlays/<env> namespace, image tags, ingress, and the sealed-secret tooling
for vcell-ai-rke (prod), vcell-ai-rke-dev (dev), vcell-ai-local
(minikube)
- scripts/ sealed_secret_{backend,frontend,ghcr}.sh + build_and_push.sh
Sealed secrets: each overlay has a master secrets.sh that reads plaintext
secrets.dat (from secrets.dat.template) and emits three SealedSecret manifests
(backend-secrets, frontend-secrets, ghcr image-pull). secrets.dat and the
generated secret-*.yaml are gitignored.
Ingress serves frontend at / and proxies /api/* to the backend (prefix stripped
via rewrite-target, since FastAPI routes are at the root). Validated with
`kubectl kustomize` for all three overlays (13 objects each).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XpNzobVi83p3YKL9sqtGwZ
Clarify in the kustomize README why frontend-secrets (AUTH0_SECRET, AUTH0_CLIENT_SECRET) exist for a "frontend": the Next.js image is a trusted Node server (BFF, Authorization Code + PKCE server-side) plus an untrusted browser bundle that never receives the secrets. Note what would change if the frontend became a public SPA client. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XpNzobVi83p3YKL9sqtGwZ
- CI publishes version tags only (no :latest); pin all three overlays to the current published image tag 0.1.6.2 so pods can actually pull. - Document that NEXT_PUBLIC_API_URL is baked into the frontend image at build time (CI hardcodes a NodePort URL), so the runtime config/*.env value is ignored — with two options to make the image environment-portable. Left as a follow-up since it touches build_containers.yml and the frontend Dockerfile. Verified: server-side dry-run of the dev overlay against the live cluster validates all 13 objects; sealed-secret tooling (secrets.sh + kubeseal) works end-to-end. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XpNzobVi83p3YKL9sqtGwZ
The vcell-fluxcd ingresses (and the cluster's letsencrypt-prod ClusterIssuer, which is HTTP-01) issue one cert per host from a single Ingress. Our host is split across two Ingress objects (frontend + a backend one that only exists for the /api rewrite, since rewrite-target is ingress-wide). Both carried the cert-manager cluster-issuer annotation, which is redundant. Keep the issuer annotation on frontend-ingress only; backend-ingress still serves TLS via the same shared secret. Verified on the cluster: backend-ingress issuer -> none, one Certificate remains. Note: the dev cert stays pending until DNS for vcell-ai-dev.cam.uchc.edu resolves to the ingress (HTTP-01 self-check currently fails with "no such host") — independent of this change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XpNzobVi83p3YKL9sqtGwZ
Point the dev (vcell-ai-rke-dev) ingress at the letsencrypt-staging ClusterIssuer and a letsencrypt-staging-vcell-ai-dev-tls secret, so cert issuance during DNS/testing doesn't consume Let's Encrypt production rate limits. Prod overlay stays on letsencrypt-prod. Verified on-cluster: the ACME challenge now targets acme-staging-v02.api.letsencrypt.org. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XpNzobVi83p3YKL9sqtGwZ
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a Kubernetes deployment for VCell-AI under
kustomize/, modeled on../sms-api/kustomize. Deploys the threedocker-compose.ymlservices — qdrant, backend (FastAPI :8000), frontend (Next.js :3000) — with a base → config → overlays structure and sealed secrets.Layout
Environments:
vcell-ai-rke(prod, on-prem UCHC RKE),vcell-ai-rke-dev(dev),vcell-ai-local(minikube).Images:
ghcr.io/virtualcell/vcell-ai-backend,ghcr.io/virtualcell/vcell-ai-frontend(pulled via a sealedghcr-secret).Sealed secrets
Each overlay has a master
secrets.shthat reads plaintextsecrets.dat(created from the committedsecrets.dat.template) and emits three SealedSecret manifests:backend-secretsAZURE_API_KEY,LANGFUSE_SECRET_KEY,LANGFUSE_PUBLIC_KEY,SUPABASE_SERVICE_ROLE_KEYsealed_secret_backend.shfrontend-secretsAUTH0_SECRET,AUTH0_CLIENT_SECRETsealed_secret_frontend.shghcr-secret.dockerconfigjsonsealed_secret_ghcr.shOnly genuinely sensitive values are sealed; everything else (Azure endpoint/deployment names, Qdrant URL, Auth0 domain/audience, Langfuse host, hostnames) lives in the committed
config/<env>/*.envConfigMaps.secrets.datand the generatedsecret-*.yamlare gitignored.Ingress
Frontend at
/, backend proxied at/api/*with the/apiprefix stripped viarewrite-target(FastAPI routes live at the root, e.g./biomodel,/kb,/query). Prod/dev use nginx + cert-manager (letsencrypt-prod) onvcell-ai.cam.uchc.edu/vcell-ai-dev.cam.uchc.edu; local uses plain HTTP onvcell-ai.local(minikube).Validation
kubectl kustomizerenders cleanly for all three overlays (13 objects each); ConfigMap name-hashes correctly propagate into the deployments'envFrom.bash -n..gitignoreexcludessecrets.datand generatedsecret-*.yaml(no secret material committed).Deploy (per environment)
Notes for reviewers
config/<env>/*.env(Azure endpoint, Auth0 domain/client-id, Supabase URL) must be filled in per deployment.NEXT_PUBLIC_*are inlined at Next.js build time —NEXT_PUBLIC_API_URLmust be set when building the frontend image, not just at runtime.secret-*.yamlare gitignored (cluster-specific; can't be generated without the target cluster's key), so an overlay only renders fully aftersecrets.shis run. This mirrors the required-but-absentsecrets.datcontract.🤖 Generated with Claude Code
https://claude.ai/code/session_01XpNzobVi83p3YKL9sqtGwZ
Follow-ups
NEXT_PUBLIC_API_URLis baked into the image at build time (touchesbuild_containers.yml+ frontendDockerfile; out of scope here).