fix(auth): don't emit expired access tokens from print-access-token#408
fix(auth): don't emit expired access tokens from print-access-token#408thomhug wants to merge 1 commit into
Conversation
print-access-token could return a stale access token (or, on error, an
empty string) instead of refreshing or failing clearly. A client sent an
hours-expired token to a resource server without leeway and got an opaque
401; a fresh token only appeared once kubelogin's cache TTL flipped on its
own.
Two root causes are addressed:
- Client.Token swallowed every error to "", so an expired-token error
became an empty string on stdout and callers couldn't tell "expired"
from "broken". Add TokenE(ctx) (string, error) and route
print-access-token through it so it exits non-zero with a clear message.
- GetTokenString only trusted kubelogin's reported ExpirationTimestamp.
It now decodes the JWT exp claim itself, and if the token is expired
(with a small leeway) forces a kubelogin refresh via the credential
plugin's ForceRefresh flag. If even the forced refresh yields an expired
token, it returns an actionable error
("access token expired on <RFC3339>, run \"nctl auth login\"") instead
of writing a stale token to stdout.
The refresh orchestration is extracted into validToken(now, fetch) so the
force-refresh and no-stale-token paths are covered by table-driven tests
(valid/expired/missing exp, missing ExpirationTimestamp, opaque tokens).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Marking this draft — it's not yet clear this change is needed. Investigating an intermittent That failure mode is being addressed elsewhere:
The expiry scenario this PR hardens against ( Holding as draft until we either capture a real expiry-reason 401 (then this is the right source-level fix), or confirm the client-side retry + billy nonce fix fully resolve it (then this may be unnecessary). |
Problem
nctl auth print-access-tokencould hand back an expired access token (or, on the error path, an empty string) instead of refreshing or failing clearly. A client (nbill) sent an hours-expired token to a resource server with no verifier leeway (billy / zitadel) and got an opaque401. A fresh token only appeared once kubelogin's own cache TTL flipped.Two root causes:
Client.Tokenswallowed every error to""— an "expired" error became an empty string on stdout, and callers couldn't distinguish expired from broken.GetTokenStringonly trusted kubelogin's reportedExpirationTimestamp— it never checked the JWTexpitself, so a token kubelogin believed valid could slip through unverified.Changes
api/client.go— addTokenE(ctx) (string, error);Tokennow delegates to it and logs the error on stderr instead of silently returning"".auth/print_access_token.go— route throughTokenE, so the command exits non-zero with a clear message instead of printing an empty/stale token.api/login.go—GetTokenStringnow decodes the JWTexpclaim itself (golang-jwt, unverified — only the expiry is needed). If the token is expired (with a small 10s leeway), it forces a kubelogin refresh via the credential plugin'sForceRefreshflag. If even the forced refresh yields an expired token, it returns an actionable error (access token expired on <RFC3339>, run "nctl auth login") rather than writing a stale token to stdout. Falls back to kubelogin'sExpirationTimestampfor non-JWT (opaque) tokens.auth/oidc.go— adapted to the newGetToken(..., forceRefresh, out)signature (falsefor the client-go exec-plugin path, which keeps its own expiry handling).Testing
api/login_test.go) cover the paths the live run can't reach: valid / expired / missingexp, missingExpirationTimestamp, opaque tokens, and the orchestration (force-refresh recovery, "still expired → clear error, no stale token", error propagation). The refresh orchestration was extracted intovalidToken(now, fetch)to make this testable without driving the real login flow.go test -race,go vet,staticcheck,golangci-lint(0 issues),modernizeall clean.Known limitation
A pure machine-vs-issuer clock skew (local clock running behind) is not caught: kubelogin and this check both use the local
time.Now(), so a token that is expired by the issuer's clock but not the local one is still considered valid. That case needs clock sync (NTP), not this change. The evidence here (kubelogin recovering once its cache TTL flipped) points to a stale-return path rather than skew, which this PR does address.The client-go API exec path (
nctl auth oidc) is intentionally left unhardened — there, client-go performs its own expiry handling.