chore: automated prune of superseded 5stack container images#513
Open
Flegma wants to merge 1 commit into
Open
Conversation
Adds a per-host systemd timer that reclaims disk from old 5stack container image versions left behind under /var/lib/rancher/k3s/agent when ghcr.io/5stackgg/*:latest is re-pulled to a new digest. The prune keeps whatever image currently holds a :latest tag, so the current game-server / game-streamer images are preserved even when they are not running. A plain `crictl rmi --prune` would delete those idle-but-current images (the concern raised on #503); keying off the tag instead avoids that. Only superseded, now-untagged versions are removed. - utils/5stack-image-prune.sh: tag-aware prune via crictl + jq - utils/setup_image_prune.sh: installs the script + a weekly systemd timer (schedule configurable via IMAGE_PRUNE_ON_CALENDAR), idempotent, root-guarded - wired into update.sh, the path every node setup funnels through, so existing nodes pick it up on the next update Refs #503
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #503
What
Adds an automated, per-host cleanup for the containerd image snapshots that fill
/var/lib/rancher/k3s/agentover time (the oldghcr.io/5stackgg/*layers thatfix-versions.shclears by hand today). It is asystemdtimer + script installed on every node.How it protects game-server / game-streamer (the #503 concern)
@lukepolo, this is built around your note that we must not clear the latest
game-server/game-streamerimages, since they usually are not running when the prune fires.Every 5stack image is deployed as
:latest. When a node pulls a new build, the:latesttag moves to the new digest and the previous version is left behind untagged (dangling). So the prune simply keys off the tag::latesttag, running or not. This preserves the currentgame-server/game-streamerimages even while idle.:latesttag (superseded versions).This deliberately avoids
crictl rmi --prune, which removes all unused images and would delete the idle-but-currentgame-server/game-streamerimages, exactly the problem you flagged.Pinned images (the pause sandbox) and non-5stack images are never touched.
On kubelet GC
I did not add
imageMaximumGCAge. Time-based kubelet GC can't exclude specific images, so it would age out idlegame-server/game-streamerfor the same reason. The tag-aware script is the mechanism that can protect them; kubelet's default disk-pressure image GC stays as the emergency net.Files
utils/5stack-image-prune.sh- the tag-aware prune (crictl+jq; falls back tok3s crictl; no-ops cleanly ifcrictl/jqare absent or containerd isn't up yet).utils/setup_image_prune.sh- installs the script and a weekly timer, idempotent and root-guarded.update.sh- callssetup_image_prune. Every node setup (install.sh,game-node-server-setup.sh,game-streamer.sh) and standalone updates funnel through here, so existing nodes pick it up on the next update.utils/utils.sh- sources the new helper.Config
IMAGE_PRUNE_ON_CALENDAR(a systemdOnCalendar=value, defaultweekly) tunes the schedule. The issue suggested monthly to start; I defaulted to weekly so snapshots don't build up between runs, but happy to change it.Testing
bash -n+shellcheckclean on both scripts.crictl images -o jsonfixture: it keepsapi:latest,game-server:latestandgame-streamer:latest(the latter two as idle/not-running), keeps the pinned pause image and a non-5stack image, and removes only the dangling old versions (including the<none>:<none>representation).Open questions
ghcr.io/5stackgg/*. Want it to also clear non-5stack dangling images, or keep it conservative?jq; game/GPU nodes already require it, but the panel host does not currently. Add ajqcheck toinstall.sh, or rely on the script's graceful skip?