fix(auth): don't emit expired access tokens from print-access-token by thomhug · Pull Request #408 · ninech/nctl

thomhug · 2026-06-01T11:55:32Z

Problem

nctl auth print-access-token could hand back an expired access token (or, on the error path, an empty string) instead of refreshing or failing clearly. A client (nbill) sent an hours-expired token to a resource server with no verifier leeway (billy / zitadel) and got an opaque 401. A fresh token only appeared once kubelogin's own cache TTL flipped.

Two root causes:

Client.Token swallowed every error to "" — an "expired" error became an empty string on stdout, and callers couldn't distinguish expired from broken.
GetTokenString only trusted kubelogin's reported ExpirationTimestamp — it never checked the JWT exp itself, so a token kubelogin believed valid could slip through unverified.

Changes

api/client.go — add TokenE(ctx) (string, error); Token now delegates to it and logs the error on stderr instead of silently returning "".
auth/print_access_token.go — route through TokenE, so the command exits non-zero with a clear message instead of printing an empty/stale token.
api/login.go — GetTokenString now decodes the JWT exp claim itself (golang-jwt, unverified — only the expiry is needed). If the token is expired (with a small 10s leeway), it forces a kubelogin refresh via the credential plugin's ForceRefresh flag. If even the forced refresh yields an expired token, it returns an actionable error (access token expired on <RFC3339>, run "nctl auth login") rather than writing a stale token to stdout. Falls back to kubelogin's ExpirationTimestamp for non-JWT (opaque) tokens.
auth/oidc.go — adapted to the new GetToken(..., forceRefresh, out) signature (false for the client-go exec-plugin path, which keeps its own expiry handling).

Testing

Live e2e against a real Keycloak cache with an expired (−65h) ID token: the new binary refreshed it and emitted a fresh valid JWT (exit 0, cache updated) — happy path and refresh confirmed intact.
Deterministic table-driven tests (api/login_test.go) cover the paths the live run can't reach: valid / expired / missing exp, missing ExpirationTimestamp, opaque tokens, and the orchestration (force-refresh recovery, "still expired → clear error, no stale token", error propagation). The refresh orchestration was extracted into validToken(now, fetch) to make this testable without driving the real login flow.
go test -race, go vet, staticcheck, golangci-lint (0 issues), modernize all clean.

Known limitation

A pure machine-vs-issuer clock skew (local clock running behind) is not caught: kubelogin and this check both use the local time.Now(), so a token that is expired by the issuer's clock but not the local one is still considered valid. That case needs clock sync (NTP), not this change. The evidence here (kubelogin recovering once its cache TTL flipped) points to a stale-return path rather than skew, which this PR does address.

The client-go API exec path (nctl auth oidc) is intentionally left unhardened — there, client-go performs its own expiry handling.

print-access-token could return a stale access token (or, on error, an empty string) instead of refreshing or failing clearly. A client sent an hours-expired token to a resource server without leeway and got an opaque 401; a fresh token only appeared once kubelogin's cache TTL flipped on its own. Two root causes are addressed: - Client.Token swallowed every error to "", so an expired-token error became an empty string on stdout and callers couldn't tell "expired" from "broken". Add TokenE(ctx) (string, error) and route print-access-token through it so it exits non-zero with a clear message. - GetTokenString only trusted kubelogin's reported ExpirationTimestamp. It now decodes the JWT exp claim itself, and if the token is expired (with a small leeway) forces a kubelogin refresh via the credential plugin's ForceRefresh flag. If even the forced refresh yields an expired token, it returns an actionable error ("access token expired on <RFC3339>, run \"nctl auth login\"") instead of writing a stale token to stdout. The refresh orchestration is extracted into validToken(now, fetch) so the force-refresh and no-stale-token paths are covered by table-driven tests (valid/expired/missing exp, missing ExpirationTimestamp, opaque tokens). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

thomhug · 2026-06-05T23:27:59Z

Marking this draft — it's not yet clear this change is needed.

Investigating an intermittent billy HTTP 401 from nbill, the billy server log showed the rejection reason was an audience/nonce mismatch on a still-valid (non-expired) token, not an expired access token:

auth.error="audience is not valid: Audience must contain client_id \"billy-production\"
            nonce does not match: expected \"\" but was \"vtnfH86…\""

That failure mode is being addressed elsewhere:

billy will stop enforcing nonce on its Bearer verifiers (a resource server shouldn't validate the nonce) — fixes the observed flap.
nbill gets retry-on-401 + a per-request token source, so a token that expires while nbill is held open (e.g. a long wizard session) is refreshed transparently instead of being sent stale.

The expiry scenario this PR hardens against (print-access-token emitting a JWT-expired token while kubelogin's cache still considers it valid) is plausible given the ~5-minute token lifetime, but we have no captured log of an actual expiry-reason 401 — the "self-heals once kubelogin's cache TTL flips" symptom that motivated it is equally explained by the nonce mechanism above.

Holding as draft until we either capture a real expiry-reason 401 (then this is the right source-level fix), or confirm the client-side retry + billy nonce fix fully resolve it (then this may be unnecessary).

thomhug marked this pull request as draft June 5, 2026 23:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(auth): don't emit expired access tokens from print-access-token#408

fix(auth): don't emit expired access tokens from print-access-token#408
thomhug wants to merge 1 commit into
ninech:mainfrom
thomhug:fix/print-access-token-expiry

thomhug commented Jun 1, 2026

Uh oh!

thomhug commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

thomhug commented Jun 1, 2026

Problem

Changes

Testing

Known limitation

Uh oh!

thomhug commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant