feat: add curated telegraf 1.38.2 build for Azure Linux 4.0 (#20399)#17798
feat: add curated telegraf 1.38.2 build for Azure Linux 4.0 (#20399)#17798WithEnoughCoffee wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR re-introduces Telegraf (missing from Azure Linux 4.0, shipped in 3.0) as a new local component (base/comps/telegraf/). Because Telegraf's default build links ~400 plugins and the full transitive dependency tree, the spec uses a curated ("Balanced") plugin set (~104 Go build tags via GO_BUILDTAGS) to shrink the binary and its CVE/dependency surface, while still vendoring the full tree per Fedora Go packaging guidelines (%gometa, Go Vendor Tools, %gobuild). It adds systemd/sysusers/logrotate integration and a %check that validates the license expression and runs the binary. It resolves #20399.
Changes:
- Adds a hand-maintained local
telegraf.specwith a curated%global buildtagsplugin policy, Go Vendor Tools license macros, sysusers, systemd unit, logrotate, and default-config generation. - Adds the component definition (
telegraf.comp.toml, manual release),go-vendor-tools.toml,telegraf.sysusers, a reproducible vendor-tarball generator script, and the rendered specs/lock/sources. - The vendor tarball source URI is currently a
127.0.0.1placeholder pending lookaside upload (noted as a known follow-up).
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
base/comps/telegraf/telegraf.comp.toml |
Local-spec component def, manual release, two source-files (upstream + vendor); vendor URI is a placeholder. |
base/comps/telegraf/telegraf.spec |
Curated Go build spec (buildtags, license macros, sysusers, systemd, %check). |
base/comps/telegraf/go-vendor-tools.toml |
askalono detector + manual SPDX entries for files the detector can't classify. |
base/comps/telegraf/telegraf.sysusers |
Declarative telegraf system user. |
base/comps/telegraf/generate_source_tarball.sh |
Reproducible go mod vendor tarball generator; comment references a stale macro name. |
specs/t/telegraf/* |
Rendered spec/sysusers/go-vendor-tools/sources (body matches base sources). |
locks/telegraf.lock |
Generated input-fingerprint lock. |
Key findings: the helper script's comment cross-references a non-existent %{plugin_tags} macro (spec uses %{buildtags}), and the vendor source URI is an unresolved 127.0.0.1 placeholder that blocks CI fetch/build until replaced. Because this introduces a brand-new forked local spec (a long-term maintenance commitment) for a vendored Go package with a large curated plugin policy, license-expression tracking, and an unresolved source URI, it warrants human review.
| # | ||
| # The custom (minimal-plugin) build still vendors the *full* dependency tree so | ||
| # the build is hermetic and offline; only the selected plugins are compiled in | ||
| # (see %%{plugin_tags} in telegraf.spec). Pruning vendor/ is intentionally NOT |
c23dfcb to
a8edaec
Compare
Restore telegraf (absent from AzL 4.0) as a general-purpose metrics agent, packaged per Fedora's modern Go guidelines. - Rebased on the go2rpm --profile vendor scaffold (Go Vendor Tools, vendored deps, %gobuild with GO_BUILDTAGS/GO_LDFLAGS), matching AzL's existing vendored-Go pattern (rootlesskit, git-lfs) and keeping it Fedora-upstreamable. - Curated "Balanced" plugin set (108 build tags: 63 inputs, 15 outputs, 7 processors, 4 aggregators, 12 parsers, 7 serializers) compiled via the upstream custom builder instead of the full ~400-plugin default. The full vendor tree is still shipped; curation only changes what is linked (261 linked modules vs 603 vendored). - Plugin policy lives entirely in one spec macro (%global buildtags), so adding or removing a plugin is a one-line change with no other spec edits. - Includes the full first-party Azure plugin set (azure_monitor in/out, azure_storage_queue, eventhub_consumer, azure_data_explorer) and the github input, since AzL is an Azure/Microsoft + GitHub product. - systemd hardening drop-in (ProtectSystem=full, ProtectHome, ProtectKernel*, ProtectControlGroups, RestrictRealtime, HOME=/var/lib/telegraf); PrivateDevices omitted so hardware inputs (smart, ipmi_sensor, infiniband) keep /dev access. - Cumulative SPDX License computed with go_vendor_license and enforced by %go_vendor_license_check; bundled(golang(...)) provides auto-generated. - Runtime packaging: sysusers (no userdel on uninstall), systemd unit, logrotate, generated default config, state dir. Verified with a full mock build (all phases incl. %check) plus install/erase scriptlet and functional plugin tests.
a8edaec to
6a6f026
Compare
| # | ||
| # The custom (minimal-plugin) build still vendors the *full* dependency tree so | ||
| # the build is hermetic and offline; only the selected plugins are compiled in | ||
| # (see %%{plugin_tags} in telegraf.spec). Pruning vendor/ is intentionally NOT |
|
issue(blocking): While having a list of required follow-ups is good, make sure to remove it before we merge. |
tobiasb-ms
left a comment
There was a problem hiding this comment.
question(blocking): I left a couple specific comments about this, but what are the practical differences between this and how we packaged it for AZL3? I know we've changed to use a fedora-blessed way of packaging, and that seems righteous. And of course we bumped the version. But I think there are more semantic differences here and we need to know what they are and why we're making them before taking this change.
| # The custom (minimal-plugin) build still vendors the *full* dependency tree so | ||
| # the build is hermetic and offline; only the selected plugins are compiled in | ||
| # (see %%{plugin_tags} in telegraf.spec). Pruning vendor/ is intentionally NOT | ||
| # done — it would break reproducibility and `go mod verify`. |
There was a problem hiding this comment.
question(non-blocking): Just curious -- how would pruning vendor/ break reproducibility. It would still be the same code right? Or what I am I missing?
There was a problem hiding this comment.
The thing is Go's vendoring is all-or-nothing at the module level — build tags exclude packages from compilation but don't affect the module graph or what go mod vendor pulls in. Even if a plugin is disabled │ via tags, its dependencies still appear in go.mod's require list, so they're part of the build list and must be in vendor/. Pruning them would require regenerating modules.txt, which changes the SRPM's cumulative SPDX │ license metadata and bundled golang declarations. That's why we keep the full vendor tree despite the binary not using all of it.
Hand-deleting directories from vendor/ (without also editing go.mod / modules.txt ) makes the tree inconsistent, and the build fails immediately with inconsistent vendoring / go mod verify errors. The │ custom build tags exclude plugins from compilation, but those modules are still require d in go.mod , so they stay in the build list.
Also it makes it easier to add plugins if we find out customers need plugins we don't have in the curated list.
Pruning "properly" means editing the module graph, which can change the binary. To drop them cleanly you'd remove the require s from go.mod and regenerate modules.txt . That changes the module graph, and under MVS │ that can shift the selected version of a shared transitive dependency still used by a compiled-in plugin — which would actually change the output. So the only safe-and-identical option is "vendor everything, compile a subset."
Auditability. We compute the cumulative SPDX License: and the bundled(golang(...)) provides over the full vendor tree. A pruned tree would make both tag-combination-dependent and fragile to maintain.
Fedora policy requires vendoring the complete dependency set anyway.
There was a problem hiding this comment.
These are great arguments for why we should not prune. But as I said I'm curious as to why pruning actually changes reproducibility of the vendored tarball. Yes, it would clearly change it but once it's changed, wouldn't the result be just as reproducible?
|
|
||
| echo "Tar vendored modules (deterministic flags for reproducibility)." | ||
| tar --sort=name \ | ||
| --mtime="2021-04-26 00:00Z" \ |
There was a problem hiding this comment.
question(non-blocking): Where did this date come from? It seems pretty arbitrary.
There was a problem hiding this comment.
Good catch. it's arbitrary. The only requirement for a reproducible tarball is that the mtime is a fixed constant, so the literal value carries no meaning — which is exactly why it reads as a magic number. I'll switch it to the convention the rest of the repo already uses for vendor/bundle tarballs ( grafana , sleef , rust-zip , python-pdfminer , …): honor SOURCE_DATE_EPOCH when the build provides it, and fall back to a documented fixed epoch instead of an unexplained date. Updated in the next push.
There was a problem hiding this comment.
Will SOURCE_DATE_EPOCH change across run or is that consistent? I agree with having something that is consistent and long-lasting -- doesn't need to change to SOURCE_DATE_EPOCH or anything. Just wanted to make to make sure the date wasn't something actually special.
I'm changing this to non-blocking and you can decide the right way.
| esac | ||
| done | ||
|
|
||
| echo "--srcTarball -> $SRC_TARBALL" |
There was a problem hiding this comment.
suggestion(non-blocking): Here and throughout the file, best practice in bash is to use brace expansion (so ${SRC_TARBALL}). Not critical at all, but if you end up making other changes you can consider this.
| # Shipped in AzL 3.0 but missing from 4.0; this restores it as a general-purpose metrics | ||
| # agent. Not available in Fedora; maintained as a local spec with vendored Go dependencies. | ||
| # Curated ("Balanced") custom build keeps the binary's dependency/CVE surface small. | ||
| # Packaged per Fedora Go guidelines for upstreaming. |
There was a problem hiding this comment.
issue(blocking): Add a link to the fedora go guidelines.
| @@ -0,0 +1,21 @@ | |||
| # Azure Linux systemd sandboxing for the telegraf metrics agent. | |||
There was a problem hiding this comment.
question(non-blocking): Did we have this in AZL3? If not, why do we need it here?
There was a problem hiding this comment.
No, AZL3 didn't have this — it installed upstream's telegraf.service as-is, with no sandboxing. This is a net-new addition for AZL4.
Rationale: telegraf is a long-lived daemon running as an unprivileged user ( User=telegraf ) whose job is only to read host metrics. That's an ideal candidate for systemd sandboxing as defense-in-depth — if a plugin or a vendored dependency is ever compromised, these restrictions limit blast radius (read-only system, no access to /home , kernel tunables/modules/logs protected, no realtime, etc.). The directives are stock systemd hardening adapted from openSUSE's hardening effort.
It's shipped as a drop-in ( 50-hardening.conf ) rather than a forked unit, so it overlays upstream without diverging the unit file — easy to audit and tweak. It's also deliberately conservative so it doesn't break │ functionality:
• PrivateDevices is omitted because the curated build includes hardware inputs ( smart , smartctl , ipmi_sensor , infiniband ) that need /dev .
• HOME=/var/lib/telegraf is set because some plugin SDKs (e.g. the Azure SDK credential cache) write under $HOME .
it's a security improvement that aligns with AzL's enterprise hardening posture.
There was a problem hiding this comment.
Possible that I'm being paranoid here but this feels risky. We're changing the thing we deliver to a customer.
I agree in principal about using stricture security settings though.
I'll change this to non-blocking but let's leave the conversation open for others (maybe @reubeno or @ddstreet) to weigh in.
| [components.telegraf] | ||
| spec = { type = "local", path = "telegraf.spec" } | ||
| # Local Go package: changelog/Release are hand-managed, not derived by rpmautospec. | ||
| release = { calculation = "manual" } |
There was a problem hiding this comment.
issue(blocking): Why wouldn't we use autorelease here?
There was a problem hiding this comment.
A few reasons, and they line up with how AzL already handles this. The spec doesn't actually use %autorelease — it has a static Release: 1%{?dist} and a literal %changelog . The autorelease calculation mode is specifically for specs that contain the %autorelease macro (it forces azldev to preserve and expand it), so pointing it at a static-release spec would be the wrong mode.
Beyond that, autorelease isn't really the AzL convention. Across base/comps only 2 of ~173 components use it ( samba and gvisor-tap-vsock ), and no local spec uses it at all. AzL seems to renders concrete, checked-in specs under specs/ with explicit Release and changelog values for reproducible, auditable, offline builds — rpmautospec's git-derived %autorelease %autochangelog is really an upstream-Fedora dist-git mechanism and isn't how AzL builds locally. Setting manual also mirrors the existing precedent: the other AzL-originating local package, azurelinux-release , uses the exact same setup ( type = "local" , release = { calculation = "manual" │ } , a static Release: , and a hand-written %changelog ), so I followed that house pattern to keep the curated changelog and release under explicit human control.
Eventually when this package is purposed to Fedora dist-git, the Fedora-native choice is %autorelease / %autochangelog (go2rpm emits them), and we can adopt those at submission time — inside the AzL │ monorepo we just match AzL's convention. And if you'd rather have azldev auto-bump the integer on each component-touching commit, auto (the default) would probably be what were looking for vs autorelease .
Happy to switch to that, though manual matches azurelinux-release and keeps the hand-written changelog authoritative.
There was a problem hiding this comment.
The argument that the spec doesn't use %autorelease is circular. That's what I'm asking.
2 of 173 components is misleading -- aren't the vast majority of those overlays on a fedora thing? A quick search I just did four four actual spec files in base/comps, two of which currently use %autorelease.
@reubeno may have some opinions about this.
There was a problem hiding this comment.
For any new specs we author, we should use %autorelease.
|
issue(blocking): This isn't in |
|
rpm-layout, build (blocking): Can you please confirm this builds on koji? I also saw Suse was the reference for defaults. We should lean towards fedora/centos/redhat. There homedir is /etc/telegraf. dnf repoquery -l telegraf | grep -v build-id |
Reference %{buildtags} (the actual spec macro) instead of the
non-existent %{plugin_tags}, and drop the spec-only %% escaping
since this is a shell script. Refresh lock fingerprint.
|
Good question — here's the full inventory of what changed from AZL3 ( A. Semantic / behavioral changes (the ones to sign off on)
B. Mechanical / compliance changes (no behavioral impact)
Net: the only changes that affect runtime behavior are the curated plugin set (#1–2), the no-user-deletion policy (#3), tighter config perms (#4), and the sandboxing drop-in (#5). Everything else is version/toolchain/compliance hygiene. Happy to expand the plugin set or relax any of these if they conflict with a known consumer. |
Agreed good call out. I will keep that in mind. |
- Reproducible vendor tarball: honor SOURCE_DATE_EPOCH (clamp-mtime) instead of an arbitrary fixed date, matching the grafana/sleef/rust-zip convention. - Create output folder before resolving the archive path so a clean/nested --outFolder works. - Use brace expansion throughout generate_source_tarball.sh. - Link the Fedora Go packaging guidelines in telegraf.comp.toml. - Publish to the base repo by adding telegraf to components-publish-channels.toml. - Refresh vendor tarball SHA512 (mtime change) in comp.toml + rendered sources. - Refresh lock.
| -*) | ||
| echo "Error: unsupported flag ${1}." >&2 | ||
| exit 1 | ||
| ;; | ||
| esac |
Revert the vendor tarball SHA512 to 68d9ef0d…89cf3, the artifact that is deterministically reproducible from generate_source_tarball.sh under the Azure Linux Go toolchain (Go 1.25.x). The previously pinned bfb5291b… hash came from a regeneration in a different environment and is not reproducible here, so it could not be verified against the canonical vendor archive. Updates comp.toml (hash + lookaside origin URI), the rendered spec sources file, and refreshes the component lock via azldev component update.
Summary
Telegraf shipped in Azure Linux 3.0 but is missing from 4.0. This restores it as a general-purpose, plugin-driven agent for collecting, processing, aggregating, and writing metrics. Resolves #20399.
Why a curated build
Built upstream-default, telegraf links ~400 plugins and the full transitive dependency tree — a large CVE surface, vendor footprint, and binary for a distro we maintain. Instead we compile a curated ("Balanced", 108 build tags) set: 63 inputs, 15 outputs, 7 processors, 4 aggregators, 12 parsers, 7 serializers. The rest are absent from the binary at build time.
azure_monitor(in/out),azure_storage_queue,eventhub_consumer,azure_data_explorer, and thegithubinput are all included, since AzL is an Azure/Microsoft + GitHub product and these should work by default.%global buildtags). Adding (or removing) a plugin is a one-line change — append its tag to the macro, e.g.:inputs.cpu inputs.disk inputs.diskio inputs.mem inputs.net inputs.netstat \ + inputs.redis \%build/%install/%fileschanges are needed, so curation stays easy to audit and evolve as requirements change.Packaging (Fedora Go guidelines)
Uses the
go2rpm --profile vendorscaffold as the baseline (Go Vendor Tools, vendored deps,%gobuildwithGO_BUILDTAGS/GO_LDFLAGS), so it can be upstreamed to Fedora and matches the vendored-Go pattern AzL already uses (rootlesskit,git-lfs). Divergences are marked# AzL:. The full vendor tree is retained (Fedora requires it); curation only affects what is compiled. The cumulative SPDXLicensetag is computed withgo_vendor_licenseand enforced by%go_vendor_license_check;bundled(golang(...))provides are auto-generated.systemd hardening
A drop-in (
50-hardening.conf) sandboxes the unprivileged agent —ProtectSystem=full,ProtectHome,ProtectHostname,ProtectClock,ProtectKernel{Tunables,Modules,Logs},ProtectControlGroups,RestrictRealtime,HOME=/var/lib/telegraf— adapted from openSUSE.PrivateDevicesis intentionally omitted so hardware inputs (smart, smartctl, ipmi_sensor, infiniband) keep/devaccess.Contents
telegraf.spec—%gometa, curated%global buildtags, Go Vendor Tools license macros, sysusers (nouserdelon uninstall), systemd unit + hardening drop-in, logrotate, generated default config, state dir,%check(license check + binary smoke test).telegraf-hardening.conf— systemd sandboxing drop-in.go-vendor-tools.toml— askalono detector + manual license entries.telegraf.comp.toml— upstream source plus the full vendor tarball.telegraf.sysusers,generate_source_tarball.sh,locks/telegraf.lock.Verification
Full mock build passes every phase including
%check. Confirmed in mock:Telegraf 1.38.2; functional collection works (cpuinput loads and emits).azure_monitorin/out,azure_data_explorer,github,eventhub_consumer, docker, prometheus, snmp, …); non-curated absent (cloudwatch, nvidia_smi, vsphere, ecs).telegrafuser (home/var/lib/telegraf), unit + drop-in install; on erase the user is intentionally retained.systemd-analyze verifyaccepts the unit + drop-in; debuginfo is split into its own subpackage.Why 1.38.2 (not 1.39.0)
telegraf 1.39.0's
go.modrequires Go 1.26; AzL 4.0 currently ships Go 1.25.8, and 1.38.2 is the latest release that builds on it. We can bump to 1.39.0 once AzL golang reaches ≥ 1.26 (which also drops the logzio azure-monitor dependency).Known follow-up
comp.tomlsource URI updated to the published URL before CI source checks and package builds can fetch it (currently a placeholder).