feat(gatewayapi): calico-system policy for namespaced data-plane proxies#4970
Open
electricjesus wants to merge 1 commit into
Open
feat(gatewayapi): calico-system policy for namespaced data-plane proxies#4970electricjesus wants to merge 1 commit into
electricjesus wants to merge 1 commit into
Conversation
d3d69a5 to
311699b
Compare
Since deploy.type=GatewayNamespace (tigera#4690) the data-plane envoy proxies run in each Gateway's own namespace, not calico-system. The only calico-system gateway policy selects the controller/certgen pods in calico-system, so the proxies match nothing and have no policy punching through a default-deny tier in the namespaces they now run in. Add a GlobalNetworkPolicy selecting the EG proxy pods (label gateway.envoyproxy.io/owning-gateway-name) so it covers every Gateway namespace with no re-render: DNS + xDS(18000)/Wasm(18002) egress to the controller in calico-system, and all inbound TCP so a managed Gateway serves traffic out of the box under a default-deny tier. Backend egress is left to the user, matching the controller policy.
311699b to
fe034ab
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Bug fix.
Since #4690 we run a single envoy-gateway controller in
calico-systemwithdeploy.type=GatewayNamespace. The data-plane proxies now run in each Gateway's own namespace, not incalico-system.We render one gateway policy in the
calico-systemtier:calico-system.envoy-gateway. It lives incalico-systemand selects the controller and certgen pods. The proxies carry a different label,gateway.envoyproxy.io/owning-gateway-name, so they match no policy in that tier. When a Gateway namespace runs a default-deny tier, the proxy there has nothing to let its traffic through. This is the same kind of miss as the conformance MetalLB pool that stayed pinned totigera-gateway(projectcalico/calico#13095).This adds a
GlobalNetworkPolicy,calico-system.envoy-gateway-proxy, that selects the proxy pods byhas(gateway.envoyproxy.io/owning-gateway-name). A GNP covers any Gateway namespace, including ones created later, with no re-render. It allows:calico-system. The proxy dials the controller (seeenvoyproxy/gatewayinternal/infrastructure/kubernetes/proxy/resource.goandinternal/xds/bootstrap). 18001 is the ratelimit path, not a proxy path, so it is left out.How I tested it
I built this operator image and ran it on an OSS master cluster (eBPF, namespaced mode). I put a Gateway, an HTTPRoute, and an nginx backend in a normal user namespace, scoped a Calico default-deny to the proxy pod, and curled the proxy from inside the cluster.
I also tried the narrower ingress idea (allow only 19001, then Pass). Under default-deny it times out: listener traffic falls through to the user's deny and the Gateway stops serving. So allowing all inbound TCP is the right default. The cost is that an Allow is terminal in this tier, so a user cannot narrow proxy ingress with their own policy. Scaling the operator back up showed it reverts any drift on the GNP and the Gateway recovers.
One thing to know
The GNP covers the proxy's own needs: DNS, the control-plane link, and ingress. It does not open egress to backends, because backends are arbitrary user workloads. A user who runs default-deny in a Gateway namespace has to allow the proxy to reach their backend themselves. Until they do, the proxy is up and configured but returns 503 on the upstream. This matches how the controller policy already works, and is worth a line in the docs.
Release Note
For PR author
make gen-files(n/a, no API change).make gen-versions(n/a).