Replace fluentd with fluent-bit in the operator#4910
Conversation
e121ac4 to
c81ad06
Compare
c81ad06 to
ff46afe
Compare
There was a problem hiding this comment.
Pull request overview
This PR completes the operator-side migration of the LogCollector from fluentd to fluent-bit (Calico Enterprise), including updated resource identities (namespace/workload names), rendered configuration shape, and updated policy/monitoring expectations across the rendering and controllers.
Changes:
- Replace fluentd-specific rendering/controller wiring with fluent-bit equivalents (names, ports, RBAC, ServiceMonitor behavior).
- Update expected Calico policies/selectors to reflect fluent-bit running in
calico-systemand using port2020. - Extend the LogCollector CRD/API to add
calicoFluentBitDaemonSetand treat the deprecatedfluentdDaemonSetas an alias during migration.
Reviewed changes
Copilot reviewed 47 out of 48 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/render/tiers/tiers_test.go | Updates tier rendering tests to stop expecting a separate LogCollector namespace. |
| pkg/render/testutils/expected_policies/node_local_dns_ipv6.json | Removes legacy tigera-fluentd namespace from node-local-dns selector expectations. |
| pkg/render/testutils/expected_policies/node_local_dns_ipv4.json | Removes legacy tigera-fluentd namespace from node-local-dns selector expectations. |
| pkg/render/testutils/expected_policies/node_local_dns_dual.json | Removes legacy tigera-fluentd namespace from node-local-dns selector expectations. |
| pkg/render/testutils/expected_policies/linseed.json | Updates Linseed ingress policy sources from fluentd labels/namespace to fluent-bit in calico-system. |
| pkg/render/testutils/expected_policies/linseed_ocp.json | Same as above for OpenShift policy expectations. |
| pkg/render/testutils/expected_policies/linseed_ocp_dpi_enabled.json | Same as above for OpenShift + DPI-enabled scenario. |
| pkg/render/testutils/expected_policies/linseed_dpi_enabled.json | Same as above for DPI-enabled scenario. |
| pkg/render/testutils/expected_policies/guardian.json | Updates Guardian policy expectations to allow fluent-bit instead of fluentd. |
| pkg/render/testutils/expected_policies/guardian_ocp.json | Same as above for OpenShift policy expectations. |
| pkg/render/testutils/expected_policies/fluentbit_unmanaged.json | Updates fluent-bit “allow metrics” policy expectations (name/namespace/selector/port). |
| pkg/render/testutils/expected_policies/fluentbit_unmanaged_ocp.json | Same as above for OpenShift. |
| pkg/render/testutils/expected_policies/fluentbit_managed.json | Updates managed-cluster policy expectations for fluent-bit metrics access. |
| pkg/render/testutils/expected_policies/es-gateway.json | Updates ES gateway ingress policy sources from fluentd to fluent-bit. |
| pkg/render/testutils/expected_policies/es-gateway_ocp.json | Same as above for OpenShift. |
| pkg/render/testutils/expected_policies/dns.json | Removes legacy tigera-fluentd namespace from DNS policy selector expectations. |
| pkg/render/testutils/expected_policies/dns_ocp.json | Same as above for OpenShift DNS policy expectations. |
| pkg/render/nonclusterhost/nonclusterhost.go | Adds Linseed RBAC permissions for DNS logs in non-cluster-host flow. |
| pkg/render/nonclusterhost/nonclusterhost_test.go | Updates RBAC test expectations to include dnslogs. |
| pkg/render/monitor/monitor.go | Renames fluentd monitoring constants to fluent-bit and switches ServiceMonitor to plain HTTP fluent-bit metrics endpoint. |
| pkg/render/monitor/monitor_test.go | Updates monitor rendering tests for renamed ServiceMonitor and the removal of TLS/scheme relabeling. |
| pkg/render/manager.go | Updates manager-rendered NetworkPolicy destination service name and cluster-wide namespace list (includes a problematic entry noted in comments). |
| pkg/render/manager_test.go | Updates manager tests to expect fluent-bit service names. |
| pkg/render/logstorage/linseed/linseed.go | Updates Linseed policy source entity rule reference to fluent-bit. |
| pkg/render/logstorage/esgateway/esgateway.go | Updates ES gateway policy source entity rule reference to fluent-bit. |
| pkg/render/intrusion_detection.go | Updates comments to reflect fluent-bit consuming IDS logs. |
| pkg/render/guardian.go | Updates Guardian policy source entity rule reference to fluent-bit. |
| pkg/render/fluentd.go | Removes the fluentd renderer implementation. |
| pkg/render/fluentd_test.go | Removes fluentd rendering unit tests. |
| pkg/render/common/elasticsearch/service.go | Renames isFluentd flag to isFluentBit and updates related commentary for managed-cluster Linseed routing. |
| pkg/render/applicationlayer/applicationlayer.go | Updates comment about WAF logs being consumed by fluent-bit. |
| pkg/imports/crds/operator/operator.tigera.io_logcollectors.yaml | Adds calicoFluentBitDaemonSet and updates fluentdDaemonSet docs/enums for the migration (some descriptions still say “Fluentd”, noted in comments). |
| pkg/controller/tiers/tiers_controller.go | Stops treating LogCollector as its own namespace when computing tier namespace lists. |
| pkg/controller/monitor/prometheus.go | Watches fluent-bit ServiceMonitor name instead of fluentd. |
| pkg/controller/monitor/monitor_controller.go | Switches monitored TLS secret from fluentd to fluent-bit. |
| pkg/controller/monitor/monitor_controller_test.go | Updates controller tests to reference fluent-bit ServiceMonitor and secrets. |
| pkg/controller/logstorage/secrets/secret_controller.go | Updates upstream cert collection references from fluentd TLS secret to fluent-bit TLS secret. |
| pkg/controller/logstorage/secrets/secret_controller_test.go | Updates tests to create/check fluent-bit TLS secret scenarios. |
| pkg/controller/logcollector/logcollector_controller.go | Rewires controller from fluentd to fluent-bit rendering (secrets, filters, services, namespace ownership behavior). |
| pkg/controller/logcollector/logcollector_controller_test.go | Updates controller tests for fluent-bit DaemonSet naming, config-driven outputs, and non-creation of calico-system. |
| pkg/components/enterprise.go | Renames enterprise component definitions from fluentd to fluent-bit (linux/windows). |
| hack/gen-versions/enterprise.go.tpl | Updates version generation template keys/structs for fluent-bit components. |
| config/enterprise_versions.yml | Replaces fluentd component entries with fluent-bit equivalents. |
| api/v1/zz_generated.deepcopy.go | Adds deepcopy support for CalicoFluentBitDaemonSet on LogCollectorSpec. |
| api/v1/logcollector_types.go | Adds calicoFluentBitDaemonSet field and documents fluentdDaemonSet as deprecated alias. |
| api/v1/fluentd_daemonset_types.go | Updates override container/initContainer enums to fluent-bit naming (backward-compat concerns noted in comments). |
Files not reviewed (1)
- api/v1/zz_generated.deepcopy.go: Generated file
2403d11 to
98e7752
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 49 out of 50 changed files in this pull request and generated 1 comment.
Files not reviewed (1)
- api/v1/zz_generated.deepcopy.go: Generated file
Comments suppressed due to low confidence (2)
pkg/controller/monitor/monitor_controller.go:156
- Monitor no longer scrapes fluent-bit over mTLS (serviceMonitorFluentBit is plain HTTP), so the controller shouldn't watch or require the fluent-bit TLS secret. Keeping it here creates an unnecessary dependency on the LogCollector controller and can degrade Monitor reconciliation when LogCollector isn't installed (secret not found).
for _, secret := range []string{
certificatemanagement.CASecretName,
esmetrics.ElasticsearchMetricsServerTLSSecret,
monitor.PrometheusServerTLSSecretName,
render.FluentBitTLSSecretName,
render.NodePrometheusTLSServerSecret,
kubecontrollers.KubeControllerPrometheusTLSSecret,
render.EKSLogForwarderTLSSecretName,
} {
pkg/controller/monitor/monitor_controller.go:345
- Since fluent-bit metrics scraping is now plain HTTP, Monitor's trusted bundle no longer needs the fluent-bit certificate. Including it makes reconciliation fail with NotFound if LogCollector hasn't created the secret yet (or isn't installed).
for _, certificateName := range []string{
esmetrics.ElasticsearchMetricsServerTLSSecret,
render.FluentBitTLSSecretName,
render.NodePrometheusTLSServerSecret,
render.CalicoAPIServerTLSSecretName,
kubecontrollers.KubeControllerPrometheusTLSSecret,
} {
| if err = addServiceMonitorFluentBitWatch(c); err != nil { | ||
| return fmt.Errorf("failed to watch ServiceMonitor fluent-bit-metrics resource: %w", err) | ||
| } |
Replace the fluentd DaemonSet with fluent-bit for log collection and forwarding. The LogCollector controller now renders the calico-fluent-bit DaemonSet (Linux and Windows) in calico-system, and pkg/render/fluentd.go is replaced by fluentbit.go. - Ship fluent-bit logs to Linseed through its built-in http output. - Rename the FluentdDaemonSet* API types to FluentBitDaemonSet* (fluentd_daemonset_types.go -> fluentbit_daemonset_types.go). Preserve the deprecated fluentdDaemonSet override field name/json tag as an alias and widen its enums to accept both the new calico-fluent-bit* names and the legacy fluentd names so existing LogCollector specs still validate; translateLegacyFluentdOverrides remaps the legacy names. - Warn on invalid fluent-bit-filters ConfigMap content (e.g. left in fluentd <filter> syntax) instead of silently dropping it. - Drop the bogus "calico-fluent-bit" entry from the manager cluster-wide namespace list; fluent-bit runs in calico-system, which is already listed. - Regenerate deepcopy, the operator CRD and enterprise versions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jiawei Huang <jiawei@tigera.io>
Forward flow, DNS and policy-activity logs from non-cluster hosts through voltron to the in-cluster calico-fluent-bit http input, and on to Linseed. - Grant dnslogs (alongside flowlogs and policyactivity) on the non-cluster-host ClusterRole so the minted host token passes voltron's SubjectAccessReview for the DNS ingestion path instead of 403ing. - Set VOLTRON_LOG_COLLECTOR_CA_BUNDLE_PATH on the manager so voltron verifies the calico-fluent-bit http input's TLS server certificate against the trusted CA bundle it already mounts (the config default /etc/pki/tls/certs/ca.crt is not mounted, so the handshake otherwise fails). - Pass NonClusterHost to the Windows fluent-bit configuration so the Linux and Windows renders produce the shared allow-calico-fluent-bit NetworkPolicy identically. Otherwise, on clusters with Windows nodes, the port-9880 ingress rule (voltron -> http input) flapped on every reconcile and intermittently dropped voltron's access. Adds a controller regression test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jiawei Huang <jiawei@tigera.io>
pkg/render/fluentbit.go had grown to ~1900 lines. Move the fluent-bit / EKS log-forwarder rendering into a new pkg/render/logcollector package, split across focused files (logcollector core, config, outputs, daemonset, rbac, networkpolicy, eks_log_forwarder) plus the moved tests. A small set of symbols stays in package render (new pkg/render/logcollector.go) to avoid a render -> render/logcollector import cycle, since Guardian, Manager, compliance, apiserver, dex and intrusion detection reference them: the log-collector network-policy identity (FluentBitSourceEntityRule, EKSLogForwarderEntityRule, LogCollectorNamespace, the fluent-bit node names, FluentBitInputService), the shared Linseed-token constants, and the TrustedBundleVolume helper. The logcollector package aliases these. The shared pod helper setNodeCriticalPod is exported as SetNodeCriticalPod (matching its sibling SetClusterCriticalPod). Pure code move; no behavior change. Build, vet, unit tests, format-check and gen-files all pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jiawei Huang <jiawei@tigera.io>
98e7752 to
51596b3
Compare
| // as an alias for one release during the Fluentd → Fluent Bit migration; | ||
| // when both are set, CalicoFluentBitDaemonSet takes precedence. | ||
| // +optional | ||
| FluentdDaemonSet *FluentBitDaemonSet `json:"fluentdDaemonSet,omitempty"` |
There was a problem hiding this comment.
This is a breaking change since the container name has changed from fluentd to fluentbit.
Description
New feature (Calico Enterprise): the LogCollector controller now deploys fluent-bit (
calico-fluent-bitincalico-system) in place of fluentd, completing the log-collector migration on the operator side.tigera-fluentd→calico-system; DaemonSet/ServiceAccountfluentd-node→calico-fluent-bit; TLS secret →calico-fluent-bit-tls; metrics port 9081 → 2020 (fluent-bit's built-in HTTP server).calico-fluent-bit-conf/-windows), subPath-mounted on Linux and directory-mounted on Windows, started with-c. The render loads the Go plugins viaplugins_file, defines parsers inline, applies the record-transform Lua filter, and inlines user-provided fluent-bit YAML filter lists. A rendered-config hash annotation rolls the pods on config-only changes./var/log/calico/calico-fluent-bit; apos-migratorinit container (Linux and Windows) seeds offsets from the legacy fluentd.posfiles and pre-creates the tailed directories. Windows tails the same log types the fluentd Windows variant shipped.:9880HTTP input with client-certificate verification; the input Service is cleaned up when the resource is removed.eks-log-forwarderruns the fluent-bit image with a renderedin_eks→linseedpipeline and health probes (no startup init container; the input plugin resolves its resume point from Linseed).:2020/api/v1/health; the ServiceMonitor scrapes plain HTTP (fluent-bit's monitoring server has no TLS) with access restricted by the component NetworkPolicy, and legacy fluentd monitors are deleted.calico-systemnamespace (deleting the LogCollector must not garbage-collect it); the deprecatedfluentdDaemonSetoverride is honored as an alias of the newcalicoFluentBitDaemonSetfield (with container-name translation); legacytigera-fluentdresources — the namespace last — are cleaned up idempotently.Testing: render, controller, and monitor unit suites updated/extended (ConfigMap-content assertions replacing the env-var assertions); the rendered configuration was validated against the real fluent-bit binary; the full migration was validated end-to-end on a test cluster — all log types flowing to Linseed/Elasticsearch, fluentd resources fully removed, tail-offset handover without re-shipping, NonClusterHost ingestion with client-certificate enforcement, and EKS/Windows render shapes verified.
Release Note
For PR author
make gen-filesmake gen-versionsFor PR reviewers
A note for code reviewers - all pull requests must have the following:
kind/bugif this is a bugfix.kind/enhancementif this is a a new feature.enterpriseif this PR applies to Calico Enterprise only.