feat(otel-collector): add OpenTelemetryCollector CRD, controller, and render#4979
Draft
tianfeng92 wants to merge 17 commits into
Draft
feat(otel-collector): add OpenTelemetryCollector CRD, controller, and render#4979tianfeng92 wants to merge 17 commits into
tianfeng92 wants to merge 17 commits into
Conversation
Replace the fluentd DaemonSet with fluent-bit for log collection and forwarding. The LogCollector controller now renders the calico-fluent-bit DaemonSet (Linux and Windows) in calico-system, and pkg/render/fluentd.go is replaced by fluentbit.go. - Ship fluent-bit logs to Linseed through its built-in http output. - Rename the FluentdDaemonSet* API types to FluentBitDaemonSet* (fluentd_daemonset_types.go -> fluentbit_daemonset_types.go). Preserve the deprecated fluentdDaemonSet override field name/json tag as an alias and widen its enums to accept both the new calico-fluent-bit* names and the legacy fluentd names so existing LogCollector specs still validate; translateLegacyFluentdOverrides remaps the legacy names. - Warn on invalid fluent-bit-filters ConfigMap content (e.g. left in fluentd <filter> syntax) instead of silently dropping it. - Drop the bogus "calico-fluent-bit" entry from the manager cluster-wide namespace list; fluent-bit runs in calico-system, which is already listed. - Regenerate deepcopy, the operator CRD and enterprise versions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jiawei Huang <jiawei@tigera.io>
Forward flow, DNS and policy-activity logs from non-cluster hosts through voltron to the in-cluster calico-fluent-bit http input, and on to Linseed. - Grant dnslogs (alongside flowlogs and policyactivity) on the non-cluster-host ClusterRole so the minted host token passes voltron's SubjectAccessReview for the DNS ingestion path instead of 403ing. - Set VOLTRON_LOG_COLLECTOR_CA_BUNDLE_PATH on the manager so voltron verifies the calico-fluent-bit http input's TLS server certificate against the trusted CA bundle it already mounts (the config default /etc/pki/tls/certs/ca.crt is not mounted, so the handshake otherwise fails). - Pass NonClusterHost to the Windows fluent-bit configuration so the Linux and Windows renders produce the shared allow-calico-fluent-bit NetworkPolicy identically. Otherwise, on clusters with Windows nodes, the port-9880 ingress rule (voltron -> http input) flapped on every reconcile and intermittently dropped voltron's access. Adds a controller regression test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jiawei Huang <jiawei@tigera.io>
pkg/render/fluentbit.go had grown to ~1900 lines. Move the fluent-bit / EKS log-forwarder rendering into a new pkg/render/logcollector package, split across focused files (logcollector core, config, outputs, daemonset, rbac, networkpolicy, eks_log_forwarder) plus the moved tests. A small set of symbols stays in package render (new pkg/render/logcollector.go) to avoid a render -> render/logcollector import cycle, since Guardian, Manager, compliance, apiserver, dex and intrusion detection reference them: the log-collector network-policy identity (FluentBitSourceEntityRule, EKSLogForwarderEntityRule, LogCollectorNamespace, the fluent-bit node names, FluentBitInputService), the shared Linseed-token constants, and the TrustedBundleVolume helper. The logcollector package aliases these. The shared pod helper setNodeCriticalPod is exported as SetNodeCriticalPod (matching its sibling SetClusterCriticalPod). Pure code move; no behavior change. Build, vet, unit tests, format-check and gen-files all pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jiawei Huang <jiawei@tigera.io>
… render Add the full OTel Collector operator support: CRD types, controller with license gating, render component, deployment override validation, fluentd integration, and unit tests for all layers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the fluentforward protocol with a custom fluentdhttp receiver that accepts Fluentd's out_http JSON format, enabling future mTLS support via the OTel confighttp framework. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pipeline Route metrics from prometheus receiver to a dedicated prometheusremotewrite exporter instead of sharing log exporters. Add Prometheus port 9090 to network policy egress rules when metrics are enabled. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tests Verify the metrics pipeline uses the dedicated prometheusremotewrite exporter and that the network policy includes Prometheus port egress when metrics are enabled. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nd mTLS Add prometheus receiver with Kubernetes SD and mTLS for scraping calico-node metrics. Render TLS volume mounts from certificate manager. Switch config generation from string builder to Go template for maintainability. Add memory_limiter processor and resource limits. Wire OTel log types through to fluent-bit output rendering. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fluent-bit's native opentelemetry output plugin is already compiled in, so we skip the bridge phase and go straight to OTLP end-to-end. Remove LogForwarderProtocol abstraction, FluentForwardPort, and fluentdhttp references. Fluent-bit now uses out_opentelemetry targeting port 4318. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tion Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Guard body field access with IsMap(body) to prevent "log bodies of type Str cannot be indexed" warnings when logs arrive as plain strings. Simplify audit classification to match all audit logs by auditID only. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…te otel-collector image Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…n and logcollector controller The OTel collector has its own controller and render path — the fluentd render code doesn't need OTelCollectorEnabled or OTelLogTypes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…metrics, and dynamic egress rules Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
69eaaa5 to
48c6ba9
Compare
| Action: v3.Allow, | ||
| Protocol: &networkpolicy.TCPProtocol, | ||
| Destination: v3.EntityRule{ | ||
| Ports: networkpolicy.Ports(uint16(p)), |
…ess rule Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…prometheus The "no metrics disabled" test checked for absence of "prometheus:" which now always appears in the telemetry.metrics.readers block. Assert on "scrape_configs:" instead, which is specific to the prometheus receiver. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
OpenTelemetryCollectorCRD with support for logs (OTLP receiver) and metrics (Prometheus receiver) pipelinesLogCollectortypes to carry OTel-related fields for fluent-bit integrationCombinedCalicoImage) consistent with upstream refactorChanges
api/v1/otelcollector_types.go,api/v1/logcollector_types.go,zz_generated.deepcopy.gopkg/controller/otelcollector/controller.go,internal/controller/pkg/render/otelcollector/component.gopkg/common/validation/otelcollector/validation.gopkg/controller/logcollector/logcollector_controller.gopkg/render/otelcollector/component_test.go,pkg/controller/otelcollector/otelcollector_controller_test.gopkg/imports/crds/operator/Test plan
make ut UT_DIR=./pkg/render/otelcollectormake ut UT_DIR=./pkg/controller/otelcollectormake ci)🤖 Generated with Claude Code