Skip to content

Replace fluentd with fluent-bit in the operator#4910

Open
hjiawei wants to merge 4 commits into
tigera:masterfrom
hjiawei:fluent-bit-deploy
Open

Replace fluentd with fluent-bit in the operator#4910
hjiawei wants to merge 4 commits into
tigera:masterfrom
hjiawei:fluent-bit-deploy

Conversation

@hjiawei

@hjiawei hjiawei commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Description

New feature (Calico Enterprise): the LogCollector controller now deploys fluent-bit (calico-fluent-bit in calico-system) in place of fluentd, completing the log-collector migration on the operator side.

  • Resource identity migration: namespace tigera-fluentdcalico-system; DaemonSet/ServiceAccount fluentd-nodecalico-fluent-bit; TLS secret → calico-fluent-bit-tls; metrics port 9081 → 2020 (fluent-bit's built-in HTTP server).
  • Configuration is rendered in fluent-bit's YAML schema into per-OS ConfigMaps (calico-fluent-bit-conf/-windows), subPath-mounted on Linux and directory-mounted on Windows, started with -c. The render loads the Go plugins via plugins_file, defines parsers inline, applies the record-transform Lua filter, and inlines user-provided fluent-bit YAML filter lists. A rendered-config hash annotation rolls the pods on config-only changes.
  • Tail inputs use the producing components' real log paths, with SQLite offset DBs and filesystem buffering under /var/log/calico/calico-fluent-bit; a pos-migrator init container (Linux and Windows) seeds offsets from the legacy fluentd .pos files and pre-creates the tailed directories. Windows tails the same log types the fluentd Windows variant shipped.
  • The linseed output matches only Linseed-bound tags, authenticates with mTLS + the pod's ServiceAccount token, and retries without limit against bounded filesystem storage. S3/Splunk/Syslog outputs mirror fluentd's per-type fan-out (standard AWS credential env vars, endpoint scheme honored, syslog ships the whole record as JSON via a per-output Lua processor with TLS properly enabled).
  • NonClusterHost renders the :9880 HTTP input with client-certificate verification; the input Service is cleaned up when the resource is removed.
  • eks-log-forwarder runs the fluent-bit image with a rendered in_ekslinseed pipeline and health probes (no startup init container; the input plugin resolves its resume point from Linseed).
  • Probes hit :2020/api/v1/health; the ServiceMonitor scrapes plain HTTP (fluent-bit's monitoring server has no TLS) with access restricted by the component NetworkPolicy, and legacy fluentd monitors are deleted.
  • The LogCollector controller no longer creates/owns the calico-system namespace (deleting the LogCollector must not garbage-collect it); the deprecated fluentdDaemonSet override is honored as an alias of the new calicoFluentBitDaemonSet field (with container-name translation); legacy tigera-fluentd resources — the namespace last — are cleaned up idempotently.

Testing: render, controller, and monitor unit suites updated/extended (ConfigMap-content assertions replacing the env-var assertions); the rendered configuration was validated against the real fluent-bit binary; the full migration was validated end-to-end on a test cluster — all log types flowing to Linseed/Elasticsearch, fluentd resources fully removed, tail-offset handover without re-shipping, NonClusterHost ingestion with client-certificate enforcement, and EKS/Windows render shapes verified.

Release Note

The LogCollector now deploys fluent-bit (calico-fluent-bit in calico-system) in place of fluentd, with operator-rendered configuration and automatic migration of fluentd tail positions.

For PR author

  • Tests for change.
  • If changing pkg/apis/, run make gen-files
  • If changing versions, run make gen-versions

For PR reviewers

A note for code reviewers - all pull requests must have the following:

  • Milestone set according to targeted release.
  • Appropriate labels:
    • kind/bug if this is a bugfix.
    • kind/enhancement if this is a a new feature.
    • enterprise if this PR applies to Calico Enterprise only.

@marvin-tigera marvin-tigera added this to the v1.43.0 milestone Jun 10, 2026
@hjiawei hjiawei added kind/enhancement New feature or request enterprise Feature applies to enterprise only labels Jun 10, 2026
@hjiawei hjiawei force-pushed the fluent-bit-deploy branch 2 times, most recently from e121ac4 to c81ad06 Compare June 11, 2026 20:04
@radTuti radTuti modified the milestones: v1.43.0, v1.44.0 Jun 12, 2026
@hjiawei hjiawei force-pushed the fluent-bit-deploy branch from c81ad06 to ff46afe Compare June 25, 2026 15:17
@hjiawei hjiawei marked this pull request as ready for review June 26, 2026 16:43
@hjiawei hjiawei requested review from a team and marvin-tigera as code owners June 26, 2026 16:43
@hjiawei hjiawei requested a review from Copilot June 26, 2026 16:43

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR completes the operator-side migration of the LogCollector from fluentd to fluent-bit (Calico Enterprise), including updated resource identities (namespace/workload names), rendered configuration shape, and updated policy/monitoring expectations across the rendering and controllers.

Changes:

  • Replace fluentd-specific rendering/controller wiring with fluent-bit equivalents (names, ports, RBAC, ServiceMonitor behavior).
  • Update expected Calico policies/selectors to reflect fluent-bit running in calico-system and using port 2020.
  • Extend the LogCollector CRD/API to add calicoFluentBitDaemonSet and treat the deprecated fluentdDaemonSet as an alias during migration.

Reviewed changes

Copilot reviewed 47 out of 48 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
pkg/render/tiers/tiers_test.go Updates tier rendering tests to stop expecting a separate LogCollector namespace.
pkg/render/testutils/expected_policies/node_local_dns_ipv6.json Removes legacy tigera-fluentd namespace from node-local-dns selector expectations.
pkg/render/testutils/expected_policies/node_local_dns_ipv4.json Removes legacy tigera-fluentd namespace from node-local-dns selector expectations.
pkg/render/testutils/expected_policies/node_local_dns_dual.json Removes legacy tigera-fluentd namespace from node-local-dns selector expectations.
pkg/render/testutils/expected_policies/linseed.json Updates Linseed ingress policy sources from fluentd labels/namespace to fluent-bit in calico-system.
pkg/render/testutils/expected_policies/linseed_ocp.json Same as above for OpenShift policy expectations.
pkg/render/testutils/expected_policies/linseed_ocp_dpi_enabled.json Same as above for OpenShift + DPI-enabled scenario.
pkg/render/testutils/expected_policies/linseed_dpi_enabled.json Same as above for DPI-enabled scenario.
pkg/render/testutils/expected_policies/guardian.json Updates Guardian policy expectations to allow fluent-bit instead of fluentd.
pkg/render/testutils/expected_policies/guardian_ocp.json Same as above for OpenShift policy expectations.
pkg/render/testutils/expected_policies/fluentbit_unmanaged.json Updates fluent-bit “allow metrics” policy expectations (name/namespace/selector/port).
pkg/render/testutils/expected_policies/fluentbit_unmanaged_ocp.json Same as above for OpenShift.
pkg/render/testutils/expected_policies/fluentbit_managed.json Updates managed-cluster policy expectations for fluent-bit metrics access.
pkg/render/testutils/expected_policies/es-gateway.json Updates ES gateway ingress policy sources from fluentd to fluent-bit.
pkg/render/testutils/expected_policies/es-gateway_ocp.json Same as above for OpenShift.
pkg/render/testutils/expected_policies/dns.json Removes legacy tigera-fluentd namespace from DNS policy selector expectations.
pkg/render/testutils/expected_policies/dns_ocp.json Same as above for OpenShift DNS policy expectations.
pkg/render/nonclusterhost/nonclusterhost.go Adds Linseed RBAC permissions for DNS logs in non-cluster-host flow.
pkg/render/nonclusterhost/nonclusterhost_test.go Updates RBAC test expectations to include dnslogs.
pkg/render/monitor/monitor.go Renames fluentd monitoring constants to fluent-bit and switches ServiceMonitor to plain HTTP fluent-bit metrics endpoint.
pkg/render/monitor/monitor_test.go Updates monitor rendering tests for renamed ServiceMonitor and the removal of TLS/scheme relabeling.
pkg/render/manager.go Updates manager-rendered NetworkPolicy destination service name and cluster-wide namespace list (includes a problematic entry noted in comments).
pkg/render/manager_test.go Updates manager tests to expect fluent-bit service names.
pkg/render/logstorage/linseed/linseed.go Updates Linseed policy source entity rule reference to fluent-bit.
pkg/render/logstorage/esgateway/esgateway.go Updates ES gateway policy source entity rule reference to fluent-bit.
pkg/render/intrusion_detection.go Updates comments to reflect fluent-bit consuming IDS logs.
pkg/render/guardian.go Updates Guardian policy source entity rule reference to fluent-bit.
pkg/render/fluentd.go Removes the fluentd renderer implementation.
pkg/render/fluentd_test.go Removes fluentd rendering unit tests.
pkg/render/common/elasticsearch/service.go Renames isFluentd flag to isFluentBit and updates related commentary for managed-cluster Linseed routing.
pkg/render/applicationlayer/applicationlayer.go Updates comment about WAF logs being consumed by fluent-bit.
pkg/imports/crds/operator/operator.tigera.io_logcollectors.yaml Adds calicoFluentBitDaemonSet and updates fluentdDaemonSet docs/enums for the migration (some descriptions still say “Fluentd”, noted in comments).
pkg/controller/tiers/tiers_controller.go Stops treating LogCollector as its own namespace when computing tier namespace lists.
pkg/controller/monitor/prometheus.go Watches fluent-bit ServiceMonitor name instead of fluentd.
pkg/controller/monitor/monitor_controller.go Switches monitored TLS secret from fluentd to fluent-bit.
pkg/controller/monitor/monitor_controller_test.go Updates controller tests to reference fluent-bit ServiceMonitor and secrets.
pkg/controller/logstorage/secrets/secret_controller.go Updates upstream cert collection references from fluentd TLS secret to fluent-bit TLS secret.
pkg/controller/logstorage/secrets/secret_controller_test.go Updates tests to create/check fluent-bit TLS secret scenarios.
pkg/controller/logcollector/logcollector_controller.go Rewires controller from fluentd to fluent-bit rendering (secrets, filters, services, namespace ownership behavior).
pkg/controller/logcollector/logcollector_controller_test.go Updates controller tests for fluent-bit DaemonSet naming, config-driven outputs, and non-creation of calico-system.
pkg/components/enterprise.go Renames enterprise component definitions from fluentd to fluent-bit (linux/windows).
hack/gen-versions/enterprise.go.tpl Updates version generation template keys/structs for fluent-bit components.
config/enterprise_versions.yml Replaces fluentd component entries with fluent-bit equivalents.
api/v1/zz_generated.deepcopy.go Adds deepcopy support for CalicoFluentBitDaemonSet on LogCollectorSpec.
api/v1/logcollector_types.go Adds calicoFluentBitDaemonSet field and documents fluentdDaemonSet as deprecated alias.
api/v1/fluentd_daemonset_types.go Updates override container/initContainer enums to fluent-bit naming (backward-compat concerns noted in comments).
Files not reviewed (1)
  • api/v1/zz_generated.deepcopy.go: Generated file

Comment thread api/v1/fluentd_daemonset_types.go Outdated
Comment thread api/v1/fluentd_daemonset_types.go Outdated
Comment thread pkg/render/manager.go
Comment thread pkg/imports/crds/operator/operator.tigera.io_logcollectors.yaml Outdated
Comment thread pkg/imports/crds/operator/operator.tigera.io_logcollectors.yaml Outdated
@hjiawei hjiawei force-pushed the fluent-bit-deploy branch 2 times, most recently from 2403d11 to 98e7752 Compare June 29, 2026 18:10
@hjiawei hjiawei requested a review from Copilot June 29, 2026 19:18

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 49 out of 50 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • api/v1/zz_generated.deepcopy.go: Generated file
Comments suppressed due to low confidence (2)

pkg/controller/monitor/monitor_controller.go:156

  • Monitor no longer scrapes fluent-bit over mTLS (serviceMonitorFluentBit is plain HTTP), so the controller shouldn't watch or require the fluent-bit TLS secret. Keeping it here creates an unnecessary dependency on the LogCollector controller and can degrade Monitor reconciliation when LogCollector isn't installed (secret not found).
	for _, secret := range []string{
		certificatemanagement.CASecretName,
		esmetrics.ElasticsearchMetricsServerTLSSecret,
		monitor.PrometheusServerTLSSecretName,
		render.FluentBitTLSSecretName,
		render.NodePrometheusTLSServerSecret,
		kubecontrollers.KubeControllerPrometheusTLSSecret,
		render.EKSLogForwarderTLSSecretName,
	} {

pkg/controller/monitor/monitor_controller.go:345

  • Since fluent-bit metrics scraping is now plain HTTP, Monitor's trusted bundle no longer needs the fluent-bit certificate. Including it makes reconciliation fail with NotFound if LogCollector hasn't created the secret yet (or isn't installed).
	for _, certificateName := range []string{
		esmetrics.ElasticsearchMetricsServerTLSSecret,
		render.FluentBitTLSSecretName,
		render.NodePrometheusTLSServerSecret,
		render.CalicoAPIServerTLSSecretName,
		kubecontrollers.KubeControllerPrometheusTLSSecret,
	} {

Comment on lines +105 to 107
if err = addServiceMonitorFluentBitWatch(c); err != nil {
return fmt.Errorf("failed to watch ServiceMonitor fluent-bit-metrics resource: %w", err)
}
hjiawei and others added 3 commits June 29, 2026 14:07
Replace the fluentd DaemonSet with fluent-bit for log collection and
forwarding. The LogCollector controller now renders the calico-fluent-bit
DaemonSet (Linux and Windows) in calico-system, and pkg/render/fluentd.go is
replaced by fluentbit.go.

- Ship fluent-bit logs to Linseed through its built-in http output.
- Rename the FluentdDaemonSet* API types to FluentBitDaemonSet*
  (fluentd_daemonset_types.go -> fluentbit_daemonset_types.go). Preserve the
  deprecated fluentdDaemonSet override field name/json tag as an alias and
  widen its enums to accept both the new calico-fluent-bit* names and the
  legacy fluentd names so existing LogCollector specs still validate;
  translateLegacyFluentdOverrides remaps the legacy names.
- Warn on invalid fluent-bit-filters ConfigMap content (e.g. left in fluentd
  <filter> syntax) instead of silently dropping it.
- Drop the bogus "calico-fluent-bit" entry from the manager cluster-wide
  namespace list; fluent-bit runs in calico-system, which is already listed.
- Regenerate deepcopy, the operator CRD and enterprise versions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jiawei Huang <jiawei@tigera.io>
Forward flow, DNS and policy-activity logs from non-cluster hosts through
voltron to the in-cluster calico-fluent-bit http input, and on to Linseed.

- Grant dnslogs (alongside flowlogs and policyactivity) on the
  non-cluster-host ClusterRole so the minted host token passes voltron's
  SubjectAccessReview for the DNS ingestion path instead of 403ing.
- Set VOLTRON_LOG_COLLECTOR_CA_BUNDLE_PATH on the manager so voltron verifies
  the calico-fluent-bit http input's TLS server certificate against the
  trusted CA bundle it already mounts (the config default
  /etc/pki/tls/certs/ca.crt is not mounted, so the handshake otherwise fails).
- Pass NonClusterHost to the Windows fluent-bit configuration so the Linux
  and Windows renders produce the shared allow-calico-fluent-bit NetworkPolicy
  identically. Otherwise, on clusters with Windows nodes, the port-9880
  ingress rule (voltron -> http input) flapped on every reconcile and
  intermittently dropped voltron's access. Adds a controller regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jiawei Huang <jiawei@tigera.io>
pkg/render/fluentbit.go had grown to ~1900 lines. Move the fluent-bit /
EKS log-forwarder rendering into a new pkg/render/logcollector package,
split across focused files (logcollector core, config, outputs, daemonset,
rbac, networkpolicy, eks_log_forwarder) plus the moved tests.

A small set of symbols stays in package render (new pkg/render/logcollector.go)
to avoid a render -> render/logcollector import cycle, since Guardian, Manager,
compliance, apiserver, dex and intrusion detection reference them: the
log-collector network-policy identity (FluentBitSourceEntityRule,
EKSLogForwarderEntityRule, LogCollectorNamespace, the fluent-bit node names,
FluentBitInputService), the shared Linseed-token constants, and the
TrustedBundleVolume helper. The logcollector package aliases these. The shared
pod helper setNodeCriticalPod is exported as SetNodeCriticalPod (matching its
sibling SetClusterCriticalPod).

Pure code move; no behavior change. Build, vet, unit tests, format-check and
gen-files all pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jiawei Huang <jiawei@tigera.io>
@hjiawei hjiawei force-pushed the fluent-bit-deploy branch from 98e7752 to 51596b3 Compare June 29, 2026 21:10
// as an alias for one release during the Fluentd → Fluent Bit migration;
// when both are set, CalicoFluentBitDaemonSet takes precedence.
// +optional
FluentdDaemonSet *FluentBitDaemonSet `json:"fluentdDaemonSet,omitempty"`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change since the container name has changed from fluentd to fluentbit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-pr-required enterprise Feature applies to enterprise only kind/enhancement New feature or request release-note-required

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants