feat(anc): wire check-hotfix into node wrapper behind ENABLE_PROVISIONING_HOTFIX#8715
Draft
Devinwong wants to merge 17 commits into
Draft
feat(anc): wire check-hotfix into node wrapper behind ENABLE_PROVISIONING_HOTFIX#8715Devinwong wants to merge 17 commits into
Devinwong wants to merge 17 commits into
Conversation
…tfix M1 2.1a) Adds a base (YYYYMM.DD) -> hotfix version (YYYYMM.DD.PATCH) map to the ANC hotfix config so a single config can pin hotfixes for multiple VHD bases at once, with default-deny for unlisted bases. The legacy single 'version' field is still honored when the map is empty for full backward compatibility. - hotfixConfig gains Hotfixes map; resolveVersion() applies map-first then legacy fallback; hotfixBaseFromVersion() splits on '.' to preserve the leading-zero day so map keys match exactly. - readHotfixConfig() added; readHotfixVersion() retained via it. - downloadHotfix() resolves via the map and still gates through the unchanged shouldUpgradeToHotfix() patch-only-strictly-higher semantics. - Unparseable current version with a map present fails open (no hotfix). Part of the Provisioning-Hotfix / live-patching-controller ConfigMap design (M1). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…o-ops Adds TestDownloadHotfix_MapMisconfiguredValueBaseSkips: when a hotfixes map entry's value base (YYYYMM.DD) does not match its key, resolveVersion selects it by key but shouldUpgradeToHotfix rejects it because the bases differ, so no wrong-base binary is installed. Locks in the default-safe behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses PR review: - downloadHotfix now logs and skips (returns nil) when the hotfix config is unreadable or invalid JSON, instead of returning an error. This honors the fail-open guarantee so a malformed config can never block provisioning. - hotfixBaseFromVersion now rejects a present-but-empty patch segment (e.g. '202604.01.') so an obviously malformed current version never selects a map entry, matching the documented YYYYMM.DD.PATCH contract. - Tests: replace TestDownloadHotfix_UnreadableFile with fail-open assertions, add TestDownloadHotfix_InvalidJSONFailsOpen, and cover the empty-patch case. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses PR review on the two fail-open tests so they prove the skip is specifically caused by the unreadable/invalid config, not an incidental version parse skip: - Set a parseable, hotfix-eligible Version (202604.01.0) and configure aptSourcesDir so a readable/valid config would proceed to install and flip installCalled. The only reason install does not fire is the config-read failure (fail-open). - Make the unreadable case robust cross-platform: if chmod 0000 is ineffective, replace the path with a directory so the read genuinely fails. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The legacy readHotfixVersion function had no production callers after downloadHotfix switched to readHotfixConfig + resolveVersion. Remove it and fold its forward-compat coverage into TestReadHotfixConfig. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tfix M1 2.1a) Adds a base (YYYYMM.DD) -> hotfix version (YYYYMM.DD.PATCH) map to the ANC hotfix config so a single config can pin hotfixes for multiple VHD bases at once, with default-deny for unlisted bases. The legacy single 'version' field is still honored when the map is empty for full backward compatibility. - hotfixConfig gains Hotfixes map; resolveVersion() applies map-first then legacy fallback; hotfixBaseFromVersion() splits on '.' to preserve the leading-zero day so map keys match exactly. - readHotfixConfig() added; readHotfixVersion() retained via it. - downloadHotfix() resolves via the map and still gates through the unchanged shouldUpgradeToHotfix() patch-only-strictly-higher semantics. - Unparseable current version with a map present fails open (no hotfix). Part of the Provisioning-Hotfix / live-patching-controller ConfigMap design (M1). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…o-ops Adds TestDownloadHotfix_MapMisconfiguredValueBaseSkips: when a hotfixes map entry's value base (YYYYMM.DD) does not match its key, resolveVersion selects it by key but shouldUpgradeToHotfix rejects it because the bases differ, so no wrong-base binary is installed. Locks in the default-safe behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses PR review: - downloadHotfix now logs and skips (returns nil) when the hotfix config is unreadable or invalid JSON, instead of returning an error. This honors the fail-open guarantee so a malformed config can never block provisioning. - hotfixBaseFromVersion now rejects a present-but-empty patch segment (e.g. '202604.01.') so an obviously malformed current version never selects a map entry, matching the documented YYYYMM.DD.PATCH contract. - Tests: replace TestDownloadHotfix_UnreadableFile with fail-open assertions, add TestDownloadHotfix_InvalidJSONFailsOpen, and cover the empty-patch case. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses PR review on the two fail-open tests so they prove the skip is specifically caused by the unreadable/invalid config, not an incidental version parse skip: - Set a parseable, hotfix-eligible Version (202604.01.0) and configure aptSourcesDir so a readable/valid config would proceed to install and flip installCalled. The only reason install does not fire is the config-read failure (fail-open). - Make the unreadable case robust cross-platform: if chmod 0000 is ineffective, replace the path with a directory so the read genuinely fails. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The legacy readHotfixVersion function had no production callers after downloadHotfix switched to readHotfixConfig + resolveVersion. Remove it and fold its forward-compat coverage into TestReadHotfixConfig. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…g/laughing-pancake
Add a fail-open 'check-hotfix' CLI subcommand that reads the base->hotfix
pointer map from the live-patching-service (LPS) over the IMDS-attested SNI
path that is reachable pre-kubelet, and stages the resolved {hotfixes:{...}}
pointer to the path download-hotfix already reads. download-hotfix keeps its
unchanged patch-only, strictly-higher gating; check-hotfix only fetches and
writes the pointer.
- Raw net/http HTTPS GET (no client-go). TLS ServerName pinned to the LPS
SNI host while the TCP dial is forced to the apiserver FQDN (curl --resolve
trick); Authorization is the IMDS attested-data signature; the server cert
is verified against the cluster CA from the provision-config.
- FQDN + cluster CA come from the AKSNodeConfig ANC already parses (the only
credential source present pre-provisioning); caSource is logged.
- Shares the hotfixConfig parser/data contract with download-hotfix.
- Always exits 0; emits CheckHotfix telemetry (lpsRead, noHotfixForBase,
notEnrolled, customDataFallback, failed).
- A reachable LPS with nothing for this node (HTTP 401 pool-not-enrolled,
403, 404) is a benign no-op (notEnrolled): no overlay is staged and it is
never classified as a failure. Only transport/5xx failures fall back.
- PoC cold-start fallback reads a lenient top-level hotfixes object from the
node config when the LPS read fails (TODO: typed contract field).
- Injectable App fields (checkHotfixFetcher, fetchAttestedToken,
nodeConfigPath) for network-free unit tests.
- The LPS route + response schema are a planned-maintenance deliverable that
is not finalized; lpsHotfixPath is a clearly-marked placeholder with a TODO.
The IMDS/LPS client helpers mirror the connectivity prototype and should be
de-duplicated into a shared LPS client when that lands.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
07b497b to
0d6f945
Compare
Add a default-off ANC_HOTFIX_ENABLED-gated call to the 2.1b check-hotfix subcommand in aks-node-controller-wrapper.sh, placed before the existing download-hotfix block since check-hotfix refreshes the hotfix pointer that block consumes. The call is fail-open and wrapped defensively so it can never block provisioning. When the flag is unset/non-true the wrapper behaves exactly as before (6-month VHD backward compat). Parameterize HOTFIX_JSON to match the existing path-var pattern and enable shellspec coverage of the download-hotfix branch. Add shellspec tests for flag off, flag on ordering, fail-open, and non-true value handling. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Clarify that the check-hotfix non-zero (fail-open) case also models a node whose VHD-baked binary predates 2.1b, where check-hotfix is an unknown subcommand. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Match the design's EnableProvisioningHotfix aks-rp region toggle and AgentBaker's contract->env naming convention (EnableIMDSRestriction -> ENABLE_IMDS_RESTRICTION), so the toggle -> absvc -> ANC opt-in chain stays traceable. No behavior change; still default-off and fail-open. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The hotfix pointer read channel moved from the kube-system ConfigMap (apiserver + bootstrap token) to the LPS endpoint (IMDS-attested); the fetch/auth rewrite lives in 2.1b. The wrapper's check-hotfix -> download-hotfix call contract, the ENABLE_PROVISIONING_HOTFIX gate, and the fail-open semantics are unchanged - only the explanatory comment is updated to name the new read channel accurately. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
9b0f1fd to
abdcd9f
Compare
Contributor
Changes cached containers or packages on windows VHDsPlease get a Windows SIG member to approve. The following dif file shows any additions or deletions from what will be cached on windows VHDs organised by VHD type.
diff --git a/vhd_files/2022-containerd-gen2.txt b/vhd_files/2022-containerd-gen2.txt
index db10c9e..c51a47f 100644
--- a/vhd_files/2022-containerd-gen2.txt
+++ b/vhd_files/2022-containerd-gen2.txt
@@ -122,0 +123 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.34.6-windows-hp
+mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.2-windows-hp
@@ -124 +124,0 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.3-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.4-windows-hp
@@ -129,0 +130 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/secrets-store/driver:v1.5.4
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.11-windows-hpc-1
@@ -131 +131,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.13-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.14-windows-hpc-1
@@ -133 +133,2 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.10-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.11-windows-hpc-1
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.8-windows-hpc-1
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.3-windows-hpc-1
@@ -135 +135,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.5-windows-hpc
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.6-windows-hpc-1
@@ -137 +136,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.36.1-windows-hpc
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.36.2-windows-hpc-1
diff --git a/vhd_files/2022-containerd.txt b/vhd_files/2022-containerd.txt
index 94de353..7312c49 100644
--- a/vhd_files/2022-containerd.txt
+++ b/vhd_files/2022-containerd.txt
@@ -122,0 +123 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.34.6-windows-hp
+mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.2-windows-hp
@@ -124 +124,0 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.3-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.4-windows-hp
@@ -129,0 +130 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/secrets-store/driver:v1.5.4
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.11-windows-hpc-1
@@ -131 +131,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.13-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.14-windows-hpc-1
@@ -133 +133,2 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.10-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.11-windows-hpc-1
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.8-windows-hpc-1
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.3-windows-hpc-1
@@ -135 +135,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.5-windows-hpc
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.6-windows-hpc-1
@@ -137 +136,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.36.1-windows-hpc
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.36.2-windows-hpc-1
diff --git a/vhd_files/2025-gen2.txt b/vhd_files/2025-gen2.txt
index d0ea692..36e3641 100644
--- a/vhd_files/2025-gen2.txt
+++ b/vhd_files/2025-gen2.txt
@@ -52,0 +53 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.34.6-windows-hp
+mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.2-windows-hp
@@ -54 +54,0 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.3-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.4-windows-hp
@@ -59,0 +60 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/secrets-store/driver:v1.5.4
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.11-windows-hpc-1
@@ -61 +61,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.13-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.14-windows-hpc-1
@@ -63 +63,2 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.10-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.11-windows-hpc-1
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.8-windows-hpc-1
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.3-windows-hpc-1
@@ -65 +65,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.5-windows-hpc
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.6-windows-hpc-1
@@ -67 +66,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.36.1-windows-hpc
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.36.2-windows-hpc-1
diff --git a/vhd_files/2025.txt b/vhd_files/2025.txt
index ab44d8b..b8873d5 100644
--- a/vhd_files/2025.txt
+++ b/vhd_files/2025.txt
@@ -52,0 +53 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.34.6-windows-hp
+mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.2-windows-hp
@@ -54 +54,0 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.3-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.4-windows-hp
@@ -59,0 +60 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/secrets-store/driver:v1.5.4
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.11-windows-hpc-1
@@ -61 +61,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.13-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.14-windows-hpc-1
@@ -63 +63,2 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.10-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.11-windows-hpc-1
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.8-windows-hpc-1
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.3-windows-hpc-1
@@ -65 +65,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.5-windows-hpc
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.6-windows-hpc-1
@@ -67 +66,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.36.1-windows-hpc
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.36.2-windows-hpc-1 |
8cd3a04 to
fe768d2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
2.1c - Wire check-hotfix into the node wrapper (shell only)
POC / M1 draft. Shell-only wiring for the Provisioning-Hotfix flow. No Go changes.
Why this exists (the scale-up gap it closes)
There are two writers of the hotfix pointer file
(
/opt/azure/containers/aks-node-controller-hotfix.json):write_filesentry generated fromhotfix/anc-hotfix-version.jsonintonodecustomdata.yml(
hotfix/anc_hotfix_generate.py). This lands in the VMSS custom data, i.e. the VMSSmodel.
check-hotfix(this wiring), which pulls the pointer live atboot and refreshes the same file before
download-hotfixgates/installs.With (A) alone, the pointer is only as fresh as the VMSS model. When an existing nodepool
scales up (autoscale or manual), the new instance boots from the nodepool's frozen model,
so it gets whatever hotfix pointer was baked in at the last model PUT - NOT the current
one. A hotfix published after that point silently misses exactly the newest nodes until a
control-plane reconcile refreshes every affected model. That stale-model-on-scale-up gap
is the core problem this design (and the PoC) targets.
(B) closes it:
check-hotfixruns in the wrapper's ExecStart and reads the pointer livefrom the LPS endpoint (IMDS-attested, reachable pre-kubelet), decoupled from the VMSS
model. A scale-up node booting from a months-old model still converges to the current
hotfix state. The read channel is deliberately pre-kubelet / IMDS-attested (not
apiserver/ConfigMap) precisely so a brand-new scale-up node can fetch it before it has a
kubeconfig. (A) remains useful as the cold-start/offline default baked into the model;
(B) is the authoritative live override.
Enablement (where this sits in the rollout chain)
This env gate is the on-node terminal of the design's region-staged opt-in:
EnableProvisioningHotfixaks-rp toggle (AKS Toggles-as-code, per region) -> absvcrespects toggle -> ANC respects toggle. This PR implements only the last hop ("ANC
respects toggle"). The env var name mirrors the toggle/contract name to match the
existing contract->env convention (e.g. EnableIMDSRestriction -> ENABLE_IMDS_RESTRICTION),
keeping the chain traceable. Wiring absvc to render this var from a contract field is a
separate follow-up PR; the aks-rp toggle + toggle YAML live in the aks-rp repo. Until
those land, the var renders unset everywhere, so this change is inert (default-off).
Note: 2.1d (#8717) relaxes this env gate, moving the on/off decision into the Go binary
via the
enable_provisioning_hotfixcontract field (single source of truth). This PRintentionally ADDS the gate; #8717 relaxes it, so each PR stays reviewable on its own.
What this does
Adds one call to the
check-hotfixsubcommand (added in 2.1b) insideaks-node-controller-wrapper.sh, gated behind a new env flagENABLE_PROVISIONING_HOTFIXthat is OFF by default.
check-hotfixreads the hotfix pointer from the LPS endpoint(IMDS-attested) and refreshes
$HOTFIX_JSON, which the existingdownload-hotfixblock consumes - so it must runfirst. The call is fail-open (the command always exits 0) and additionally wrapped
defensively so it can never block provisioning.
Default-off / fail-open guarantee
When
ENABLE_PROVISIONING_HOTFIXis unset, empty, or any value other than the literalstring
true, the wrapper behaves EXACTLY as it does today. This preserves the6-month VHD backward-compatibility window: older VHDs running newer CSE, and newer
VHDs running older CSE, are unaffected unless the flag is explicitly turned on.
Known-safe: old VHD + flag on
If
ENABLE_PROVISIONING_HOTFIX=trueever reaches a node whose VHD-baked ANC binary predates2.1b,
"$BIN_PATH" check-hotfixis an unknown subcommand and exits non-zero. Theif ... else log "...continuing (fail-open)" fiwrapper swallows that error, soprovisioning still proceeds unchanged. This path is covered by shellspec case 4 below
(check-hotfix exits non-zero -> wrapper still provisions), which models the missing
subcommand. This matters for the 6-month VHD support window.
Before / after flow
Flag off (default - unchanged):
Flag on (
ENABLE_PROVISIONING_HOTFIX=true):Notes
check-hotfixtakes no flags/args; it reads the AKSNodeConfig from its defaulton-node path internally for the LPS endpoint (IMDS-attested) it reads, so the wrapper passes nothing.
HOTFIX_JSONis parameterized as${HOTFIX_JSON:-<default>}to match the existingBIN_PATH/CONFIG_PATH/NBC_CMD_PATHpattern and to allow shellspec to exercisethe download-hotfix branch. Production default path is unchanged.
defaultHotfixVersionPath(
/opt/azure/containers/aks-node-controller-hotfix.json, hotfix.go) and download-hotfixreads the same constant. The wrapper's
HOTFIX_JSONdefault is byte-identical, and theGo
hotfixVersionPathoverride exists only for tests (no env/production override andcheck-hotfix takes no path flag), so the two never diverge on a node.
[ ],=,${VAR:-}); passes shellcheck generic + POSIX (SC3010/SC3014)and the wrapper shellspec suite (8 examples, 0 failures).
Tests
New shellspec cases in
aks_node_controller_wrapper_spec.sh:Stack
Base is set to the 2.1b branch so the diff shows only the wrapper + shellspec changes.
Will retarget to main as the stack merges down.
This unblocks the on-node e2e PoC tests (fail-open and multi-base) since check-hotfix
is otherwise never invoked at boot.