fix: localdns readiness curl must bypass HTTP proxy on http-proxy clusters#8834
Open
yewmsft wants to merge 2 commits into
Open
fix: localdns readiness curl must bypass HTTP proxy on http-proxy clusters#8834yewmsft wants to merge 2 commits into
yewmsft wants to merge 2 commits into
Conversation
…sters
On clusters configured with an HTTP proxy, the proxy environment is inherited
by localdns.service via systemd DefaultEnvironment. The LocalDNS readiness
health check curls the node-local listener (169.254.10.10:8181/ready), which
must not go through the proxy since a link-local address is not routable by an
external proxy.
CURL_COMMAND was a string invoked unquoted ($CURL_COMMAND), so any attempt to
pass --noproxy '*' underwent word-splitting without quote-removal and curl
received a literal 3-char arg '*', leaving the proxy bypass ineffective. The
readiness probe was then proxied, never returned OK, and node bootstrap (CSE)
blocked until timeout -- failing node provisioning whenever LocalDNS was enabled
behind an HTTP proxy. Reproduces regardless of outbound type (UDR or LB).
Fix: declare CURL_COMMAND as a bash array and expand it as "${CURL_COMMAND[@]}"
so arguments are passed verbatim with correct quoting, and add an explicit
--noproxy for the node-local listener IP plus connect/max timeouts. Update the
ShellSpec stubs to the array form accordingly.
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes LocalDNS readiness/health checks hanging on HTTP-proxy clusters by ensuring the curl probe to the node-local readiness endpoint bypasses proxy settings and that curl arguments are passed with correct shell quoting.
Changes:
- Convert
CURL_COMMANDfrom a string to a bash array and expand it as"${CURL_COMMAND[@]}"to preserve argument boundaries. - Add
--noproxy "${LOCALDNS_NODE_LISTENER_IP}"plus curl timeouts to prevent proxying and avoid hangs during readiness checks. - Update ShellSpec stubs to set
CURL_COMMANDas an array to match the new calling convention.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| parts/linux/cloud-init/artifacts/localdns.sh | Switch curl invocation to array-based execution and force proxy bypass + timeouts for the link-local readiness probe. |
| spec/parts/linux/cloud-init/artifacts/localdns_spec.sh | Update tests to stub CURL_COMMAND as an array to align with the new implementation. |
Replace @SriHarsha001 with @saewoni as the localdns code owner across localdns.sh, localdns.service, localdns-delegate.conf, and localdns_spec.sh. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Member
Author
|
/azp run AKS Linux VHD Build - PR check-in gate |
|
Azure Pipelines successfully started running 1 pipeline(s). |
awesomenix
approved these changes
Jul 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
On AKS clusters configured with an HTTP proxy, enabling LocalDNS causes node bootstrap (CSE) to hang until timeout, failing node provisioning (create / scale / node-image upgrade / surge). Root cause is a shell-quoting bug in the LocalDNS readiness health check.
localdns.serviceinherits the system-wide proxy env (via systemdDefaultEnvironmenton http-proxy clusters). The readiness probe curls the node-local listener169.254.10.10:8181/ready, which must not be sent through the proxy — a link-local address is not routable by an external HTTP proxy.CURL_COMMANDwas a string invoked unquoted ($($CURL_COMMAND)). Word-splitting happens without quote-removal, so any attempt to pass--noproxy '*'reached curl as a literal 3-char argument'*', leaving the proxy bypass ineffective. The probe was proxied, never returnedOK, and CSE blocked until timeout.Fix
CURL_COMMANDas a bash array and expand as"${CURL_COMMAND[@]}"at both call sites (wait_for_localdns_ready,start_localdns_watchdog) so arguments are passed verbatim with correct quoting.--noproxy "${LOCALDNS_NODE_LISTENER_IP}"so the node-local readiness request always bypasses the proxy, plus--connect-timeout/--max-timeso a probe can't hang.wait_for_localdns_readyShellSpec stubs to array form (CURL_COMMAND=(echo OK)), required by the new expansion.Validation
noProxy.make generate-testdata— allpkg/agent/...tests pass, no generated-file diff (localdns.sh is not embedded in the CSE snapshot).localdns_spec.sh: 85 examples, 0 failures.bash -nsyntax check on both files.Test plan
localdns_spec.shpasses (85/0)pkg/agent/...go tests pass, no snapshot drift🤖 Generated with Claude Code