Skip to content

feat(gatewayapi): calico-system policy for namespaced data-plane proxies#4970

Open
electricjesus wants to merge 1 commit into
tigera:masterfrom
electricjesus:seth/gatewayapi-proxy-allow-policy
Open

feat(gatewayapi): calico-system policy for namespaced data-plane proxies#4970
electricjesus wants to merge 1 commit into
tigera:masterfrom
electricjesus:seth/gatewayapi-proxy-allow-policy

Conversation

@electricjesus

@electricjesus electricjesus commented Jun 26, 2026

Copy link
Copy Markdown
Member

Description

Bug fix.

Since #4690 we run a single envoy-gateway controller in calico-system with deploy.type=GatewayNamespace. The data-plane proxies now run in each Gateway's own namespace, not in calico-system.

We render one gateway policy in the calico-system tier: calico-system.envoy-gateway. It lives in calico-system and selects the controller and certgen pods. The proxies carry a different label, gateway.envoyproxy.io/owning-gateway-name, so they match no policy in that tier. When a Gateway namespace runs a default-deny tier, the proxy there has nothing to let its traffic through. This is the same kind of miss as the conformance MetalLB pool that stayed pinned to tigera-gateway (projectcalico/calico#13095).

This adds a GlobalNetworkPolicy, calico-system.envoy-gateway-proxy, that selects the proxy pods by has(gateway.envoyproxy.io/owning-gateway-name). A GNP covers any Gateway namespace, including ones created later, with no re-render. It allows:

  • Egress to DNS, plus xDS (18000) and Wasm fetch (18002) to the controller in calico-system. The proxy dials the controller (see envoyproxy/gateway internal/infrastructure/kubernetes/proxy/resource.go and internal/xds/bootstrap). 18001 is the ratelimit path, not a proxy path, so it is left out.
  • Ingress on all inbound TCP. Listener ports are user-defined, so the proxy has to accept any port to work out of the box. This also covers the 19001 metrics scrape.

How I tested it

I built this operator image and ran it on an OSS master cluster (eBPF, namespaced mode). I put a Gateway, an HTTPRoute, and an nginx backend in a normal user namespace, scoped a Calico default-deny to the proxy pod, and curled the proxy from inside the cluster.

Step Setup Result
Baseline no deny HTTP 200
Reproduce default-deny, stock operator (no GNP) times out, proxy fully blocked
This PR default-deny, our operator (GNP) HTTP 503: proxy reachable and routing, only the backend hop is denied
Backend allow user allows proxy to backend HTTP 200

I also tried the narrower ingress idea (allow only 19001, then Pass). Under default-deny it times out: listener traffic falls through to the user's deny and the Gateway stops serving. So allowing all inbound TCP is the right default. The cost is that an Allow is terminal in this tier, so a user cannot narrow proxy ingress with their own policy. Scaling the operator back up showed it reverts any drift on the GNP and the Gateway recovers.

One thing to know

The GNP covers the proxy's own needs: DNS, the control-plane link, and ingress. It does not open egress to backends, because backends are arbitrary user workloads. A user who runs default-deny in a Gateway namespace has to allow the proxy to reach their backend themselves. Until they do, the proxy is up and configured but returns 503 on the upstream. This matches how the controller policy already works, and is worth a line in the docs.

Release Note

Add a `calico-system`-tier GlobalNetworkPolicy for Envoy Gateway data-plane proxies so Gateways work under a default-deny tier when proxies run in per-Gateway namespaces (deploy.type=GatewayNamespace).

For PR author

  • Tests for change.
  • If changing pkg/apis/, run make gen-files (n/a, no API change).
  • If changing versions, run make gen-versions (n/a).

@marvin-tigera marvin-tigera added this to the v1.44.0 milestone Jun 26, 2026
@electricjesus electricjesus force-pushed the seth/gatewayapi-proxy-allow-policy branch from d3d69a5 to 311699b Compare June 26, 2026 08:09
@electricjesus electricjesus marked this pull request as ready for review June 26, 2026 08:16
@electricjesus electricjesus requested a review from a team as a code owner June 26, 2026 08:16
Since deploy.type=GatewayNamespace (tigera#4690) the data-plane envoy proxies run
in each Gateway's own namespace, not calico-system. The only calico-system
gateway policy selects the controller/certgen pods in calico-system, so the
proxies match nothing and have no policy punching through a default-deny
tier in the namespaces they now run in.

Add a GlobalNetworkPolicy selecting the EG proxy pods (label
gateway.envoyproxy.io/owning-gateway-name) so it covers every Gateway
namespace with no re-render: DNS + xDS(18000)/Wasm(18002) egress to the
controller in calico-system, and all inbound TCP so a managed Gateway serves
traffic out of the box under a default-deny tier. Backend egress is left to
the user, matching the controller policy.
@electricjesus electricjesus force-pushed the seth/gatewayapi-proxy-allow-policy branch from 311699b to fe034ab Compare June 26, 2026 10:13
@electricjesus electricjesus changed the title feat(gatewayapi): allow-tigera policy for namespaced data-plane proxies feat(gatewayapi): calico-system policy for namespaced data-plane proxies Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants