Skip to content

chore: automated prune of superseded 5stack container images#513

Open
Flegma wants to merge 1 commit into
mainfrom
chore/image-prune-lifecycle
Open

chore: automated prune of superseded 5stack container images#513
Flegma wants to merge 1 commit into
mainfrom
chore/image-prune-lifecycle

Conversation

@Flegma

@Flegma Flegma commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Closes #503

What

Adds an automated, per-host cleanup for the containerd image snapshots that fill /var/lib/rancher/k3s/agent over time (the old ghcr.io/5stackgg/* layers that fix-versions.sh clears by hand today). It is a systemd timer + script installed on every node.

How it protects game-server / game-streamer (the #503 concern)

@lukepolo, this is built around your note that we must not clear the latest game-server / game-streamer images, since they usually are not running when the prune fires.

Every 5stack image is deployed as :latest. When a node pulls a new build, the :latest tag moves to the new digest and the previous version is left behind untagged (dangling). So the prune simply keys off the tag:

  • Keep any image that still holds a :latest tag, running or not. This preserves the current game-server / game-streamer images even while idle.
  • Remove only 5stack images that no longer hold a :latest tag (superseded versions).

This deliberately avoids crictl rmi --prune, which removes all unused images and would delete the idle-but-current game-server / game-streamer images, exactly the problem you flagged.

Pinned images (the pause sandbox) and non-5stack images are never touched.

On kubelet GC

I did not add imageMaximumGCAge. Time-based kubelet GC can't exclude specific images, so it would age out idle game-server / game-streamer for the same reason. The tag-aware script is the mechanism that can protect them; kubelet's default disk-pressure image GC stays as the emergency net.

Files

  • utils/5stack-image-prune.sh - the tag-aware prune (crictl + jq; falls back to k3s crictl; no-ops cleanly if crictl/jq are absent or containerd isn't up yet).
  • utils/setup_image_prune.sh - installs the script and a weekly timer, idempotent and root-guarded.
  • update.sh - calls setup_image_prune. Every node setup (install.sh, game-node-server-setup.sh, game-streamer.sh) and standalone updates funnel through here, so existing nodes pick it up on the next update.
  • utils/utils.sh - sources the new helper.

Config

IMAGE_PRUNE_ON_CALENDAR (a systemd OnCalendar= value, default weekly) tunes the schedule. The issue suggested monthly to start; I defaulted to weekly so snapshots don't build up between runs, but happy to change it.

Testing

  • bash -n + shellcheck clean on both scripts.
  • Verified the selection logic against a sample crictl images -o json fixture: it keeps api:latest, game-server:latest and game-streamer:latest (the latter two as idle/not-running), keeps the pinned pause image and a non-5stack image, and removes only the dangling old versions (including the <none>:<none> representation).

Open questions

  1. Weekly vs monthly default for the timer?
  2. Scope is intentionally limited to ghcr.io/5stackgg/*. Want it to also clear non-5stack dangling images, or keep it conservative?
  3. The script needs jq; game/GPU nodes already require it, but the panel host does not currently. Add a jq check to install.sh, or rely on the script's graceful skip?

Adds a per-host systemd timer that reclaims disk from old 5stack container
image versions left behind under /var/lib/rancher/k3s/agent when
ghcr.io/5stackgg/*:latest is re-pulled to a new digest.

The prune keeps whatever image currently holds a :latest tag, so the current
game-server / game-streamer images are preserved even when they are not
running. A plain `crictl rmi --prune` would delete those idle-but-current
images (the concern raised on #503); keying off the tag instead avoids that.
Only superseded, now-untagged versions are removed.

- utils/5stack-image-prune.sh: tag-aware prune via crictl + jq
- utils/setup_image_prune.sh: installs the script + a weekly systemd timer
  (schedule configurable via IMAGE_PRUNE_ON_CALENDAR), idempotent, root-guarded
- wired into update.sh, the path every node setup funnels through, so existing
  nodes pick it up on the next update

Refs #503
@Flegma Flegma requested a review from lukepolo June 19, 2026 08:45
@Flegma Flegma marked this pull request as ready for review June 20, 2026 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CHORE] k3s snapshots lifecycle

1 participant