Skip to content

Improve build system and CI/CD pipeline#140

Open
Mes0903 wants to merge 6 commits into
sysprog21:masterfrom
Mes0903:split-local-ci-prebuilt
Open

Improve build system and CI/CD pipeline#140
Mes0903 wants to merge 6 commits into
sysprog21:masterfrom
Mes0903:split-local-ci-prebuilt

Conversation

@Mes0903

@Mes0903 Mes0903 commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Improve build system and CI/CD pipeline

This pull request restructures the guest artifact build flow used by semu.

The main reason for this change is the ordering problem in the old CI/CD flow. When a branch was merged into master, the device tests could request the rolling release prebuilt artifacts before the new prebuilt release had been built and published. That meant the tests could boot with stale release artifacts even though the current checkout required new guest artifacts, so the merge-time CI/CD path could fail for reasons unrelated to the emulator changes being tested.

The old PR flow also compared guest artifact inputs against the master branch. When a PR changed a guest config, every later push to that PR could trigger another source rebuild, even if the new commit did not touch any guest config or recipe input. This made PR iteration expensive because the decision was tied to branch drift instead of the exact artifact recipe required by the current checkout.

The old flow also made the build and test pipeline depend heavily on GitHub-specific behavior. Cache restore, workflow artifacts, release downloads, publish decisions, and test setup were coupled together in the workflow. This PR redraws that architecture around clearer layers: source-build recipes, artifact resolution, artifact materialization, packaging/publishing, and test consumption. GitHub Actions remains the current execution environment, but the GitHub-specific pieces are now adapters around those layers rather than the place where the artifact policy lives.

Local development stays close to Make’s file-target model. Missing guest artifacts can still be downloaded from the rolling prebuilt release, while source rebuilds are explicit through make build-artifacts. This keeps the normal local path predictable and preserves the Buildroot cache workflow developers rely on when iterating on guest configuration.

CI owns the artifact decision model. The workflow computes recipe keys for the current checkout, compares them with restored workflow-local raw artifacts and the rolling release manifest, then materializes a plan as cache reuse, release download, or source build. The resolver decides where artifacts should come from, and the materializer performs that plan.

The rolling release contract is also tightened. Recipe keys continue to decide whether a release artifact matches the current source recipe, while archive checksums verify that downloaded .bz2 bytes match the published manifest before decompression and stamping. These checks cover different failure modes, so the release manifest now records both archive checksum entries and virtual recipe-key entries.

Local build flow

make build-artifacts replaces the old image-oriented source build entrypoint with an artifact-class based flow. The local CLI selects which classes to build, handles cleanup policy, and delegates the actual source recipes to shared helpers under scripts/prebuilt/.

The default rootfs and test-tools rootfs now use separate Buildroot output directories. This keeps optional test payloads such as X11 and DirectFB2 from changing the default rootfs.cpio and derived ext4.img path. ext4.img remains a local derived artifact built from rootfs.cpio.

Test-tools payload selection is explicit. Options such as --x11 and --directfb2-test are passed directly to the build command, and DirectFB2 no longer implicitly selects X11.

CI artifact flow

CI is split into resolution and materialization. The resolver reads the current recipe inputs, restored cache stamps, and release manifest, then emits a plan for each artifact class. The materializer executes that plan without making new policy decisions.

Workflow-local raw artifacts are tracked with .stamps/prebuilt-local/*.recipe-key. Those stamps allow restored CI cache entries to be checked against the current checkout before reuse. Test jobs consume either the raw artifact bundle produced by the current workflow run or the rolling prebuilt release through the same composite action.

Publishing is part of the main workflow. After the required tests run against the same materialized artifact set, the publish job verifies the packaged artifacts and updates the rolling prebuilt release when the resolver reports release drift.

Release manifest

prebuilt.sha1 now records both compressed archive checksums and recipe-key entries. The archive checksum entries verify transport integrity for Image.bz2, rootfs.cpio.bz2, and test-tools.img.bz2. The recipe-key entries describe the source recipe that produced each artifact class.

Local release downloads verify archive bytes before decompression. CI release materialization additionally verifies that the manifest recipe key still matches the resolver plan before stamping the raw artifact as current.

Fork and mirror repositories can bootstrap from source when the rolling release manifest returns HTTP 404. Other release download failures remain visible as release or network state problems.

Documentation

The README now describes missing guest artifacts as release downloads and source guest rebuilds as an explicit make build-artifacts operation.

docs/build-system-and-ci.md documents the local build path, CI artifact contract, resolver/materializer model, package and publish flow, and fork/mirror bootstrap behavior.

The networking document is updated to point at the moved device smoke script path.


Summary by cubic

Rebuilt the guest artifact build and CI/CD pipeline around artifact classes with a recipe‑keyed resolver/materializer and verified release archive checksums to prevent stale prebuilts and speed up PRs. Added selectable ROOTFS_CPIO for reliable ext4.img on macOS and early validation of default minimal.dtb boot options.

  • Refactors

    • Local builds via make build-artifacts using scripts/prebuilt/*; separate Buildroot outputs for default rootfs vs. test-tools; explicit test-tools flags (--x11, --directfb2-test).
    • CI plans per class (use-action-cache/download-release/build) with stamps in .stamps/prebuilt-local; tests get artifacts via provide-guest-artifacts; removed the standalone prebuilt workflow.
    • Release: prebuilt.sha1 records *.bz2 checksums and *.recipe-key; downloads verify bytes and expected recipe; package/verify via .ci/prebuilt/package.sh and verify-package.sh.
    • Tooling/UX: ROOTFS_CPIO selection with Homebrew cpio on macOS; preflight checks for default minimal.dtb when using -i/-d and root=/dev/vda.
  • Migration

    • Developers: make check downloads verified prebuilts; rebuild with make build-artifacts [image|rootfs|test-tools] plus --x11/--directfb2-test, --clean-build, or --full-rebuild. On macOS, set ROOTFS_CPIO if needed.
    • Maintainers: rolling prebuilt release publishes from the main workflow on master after tests; master runs aren’t canceled by concurrency.
    • Forks/mirrors: PREBUILT_URL defaults to this repository; CI bootstraps from source when the release manifest returns 404.

Written for commit 184be80. Summary will update on new commits.

Review in cubic

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed

Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.

Re-trigger cubic

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 36 files

Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.

Re-trigger cubic

Comment thread mk/external.mk
Comment thread docs/build-system-and-ci.md Outdated
Comment thread docs/build-system-and-ci.md Outdated
@Mes0903 Mes0903 force-pushed the split-local-ci-prebuilt branch from f735efa to 40a4254 Compare June 29, 2026 05:17
@Mes0903

Mes0903 commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator Author

@cubic-dev-ai

@cubic-dev-ai

cubic-dev-ai Bot commented Jun 29, 2026

Copy link
Copy Markdown

@cubic-dev-ai

@Mes0903 I have started the AI code review. It will take a few minutes to complete.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 37 files

Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.

Re-trigger cubic

@Mes0903 Mes0903 force-pushed the split-local-ci-prebuilt branch from 40a4254 to d182666 Compare June 29, 2026 09:48
Comment thread main.c Outdated
Comment thread scripts/build-artifacts.sh Outdated
@Mes0903 Mes0903 force-pushed the split-local-ci-prebuilt branch from d182666 to fb64e4a Compare July 2, 2026 00:40
Mes0903 added 6 commits July 2, 2026 08:51
Replace 'scripts/build-image.sh' with a source-build path organized
around artifact classes. 'scripts/build-artifacts.sh' becomes the local
user interface, while the actual source recipes live under
'scripts/prebuilt' and can be shared by local builds and CI.

The local CLI owns artifact selection and local cleanup policy. It
parses requests such as 'image', 'rootfs', 'test-tools', and 'all',
handles flags such as '--clean-build' and '--full-rebuild', and decides
whether to derive 'ext4.img'. 'scripts/prebuilt/artifact-recipes.sh'
owns the Buildroot and Linux source operations, publishes raw artifacts,
and keeps recipe-specific work out of the 'Makefile'.

Default rootfs and test-tools rootfs now use separate Buildroot output
directories. The default output produces 'rootfs.cpio' and the local
'ext4.img'. The test-tools output produces 'test-tools.img' with the
selected payloads. This keeps default rootfs artifacts stable when the
test-tools recipe adds a larger filesystem payload.

The test-tools recipe is explicit.
'scripts/prebuilt/test-tools-recipe.sh' normalizes selections such as
'x11' and 'directfb2', and 'scripts/build-artifacts.sh' accepts flags
like '--x11' and '--directfb2-test' directly. DirectFB2 and X11 are
independent recipe entries.

Local Make keeps the file-target model for guest artifacts. 'Image',
'rootfs.cpio', and 'test-tools.img' are ordinary targets from
'mk/external.mk', and missing files are fetched through the shared
release download helper. Source rebuilds are requested through
'make build-artifacts', so changing a guest config means the developer
explicitly asks the source-build path to refresh local artifacts.

Cleanup semantics are made explicit as well. Generated stamp files move
under '.stamps', 'make clean' removes Buildroot output directories,
'distclean' removes external source trees, '--clean-build' refreshes
selected outputs while keeping source checkouts and download cache, and
'--full-rebuild' starts from freshly cloned pinned revisions. This keeps
the fast local Buildroot-cache workflow available while still giving
developers a complete rebuild path when they need one.

Move recipe pins into 'artifact-recipe.env' and rename the Meson RISC-V
cross file to make its role clear. CI recipe keys own release freshness;
local Make is responsible for existing files, missing-file downloads,
and explicit source rebuilds.
Replace the old PR drift job and standalone prebuilt workflow with a
resolver/materializer model inside the main workflow. The new flow
treats guest artifacts as a planned input to the test jobs: first decide
where every class should come from, then materialize exactly that plan
before tests or publishing consume it.

The resolver computes the current recipe key for each registered
artifact class from '.ci/prebuilt/artifact-inputs.sh'. It compares those
keys with the rolling release manifest and with any
'.stamps/prebuilt-local' entries restored from the workflow-local raw
cache. The output is a 'KEY=VALUE' plan containing one action per class:
'use-action-cache', 'download-release', or 'build'.

The materializer executes that plan without making new policy decisions.
Restored cache entries are accepted when their raw artifact and
recipe-key stamp match the current recipe key. Release-backed entries
are downloaded through the shared transport helper. Build entries are
passed to '.ci/prebuilt/build-plan-artifacts.sh', which calls the same
'scripts/prebuilt' source recipes used by local 'make build-artifacts'.

The workflow-local cache stores raw artifacts plus
'.stamps/prebuilt-local'. The rolling release stores compressed '.bz2'
artifacts plus 'prebuilt.sha1'. CI uses raw artifacts for fast
same-workflow or nearby-workflow reuse, and publishes a compact rolling
release for local Make and test jobs that consume prebuilt artifacts.

Move package and verification work into '.ci/prebuilt/package.sh' and
'.ci/prebuilt/verify-package.sh'. 'guest-artifact-build' uploads raw
artifacts for test jobs and packaged artifacts for 'publish-prebuilt'.
The publish job runs on master after the required tests have exercised
the same materialized artifacts and the resolver has reported that the
rolling release needs an update.

The workflow layout is cleaned up at the same time. Device smoke scripts
move under '.ci/device-smoke', GitHub output and formatting helpers move
under '.ci/github', and 'append-github-output.sh' validates the
requested outputs before appending them. Publishing follows the main
workflow's resolved artifact plan, so the old '.ci/publish-prebuilt.sh'
and '.github/workflows/prebuilt.yml' are removed.

Add '.github/actions/provide-guest-artifacts' so Linux and macOS test
jobs consume artifacts through the same contract. Test jobs either
download a current rolling release or download the raw artifact bundle
from the current workflow run. 'PREBUILT_FORBID_BUILD' keeps test jobs
from compiling guest artifacts after the build job has already made the
release/cache/build decision.
Extend the prebuilt release manifest so it records transport integrity
as well as source recipe state. 'prebuilt.sha1' now contains sha1sum
entries for each compressed artifact, followed by the virtual
'*.recipe-key' entries used by the resolver. The two kinds of entries
answer different questions and are intentionally kept in the same
manifest.

Recipe keys decide freshness. The resolver compares the current class
recipe key with the release recipe key to decide whether the rolling
release matches the current checkout. Archive checksums verify that the
downloaded '.bz2' bytes match the manifest the resolver and materializer
are using.

Update '.ci/prebuilt/package.sh' to write archive checksum entries for
every class output before writing the recipe-key entries. Update
'.ci/prebuilt/verify-package.sh' to require both entry types and to
recompute each compressed archive checksum before publish. This catches
packaging mistakes before the rolling release is updated.

Make 'scripts/prebuilt/download-release-artifact.sh' fetch
'prebuilt.sha1' before downloading an artifact. The helper verifies the
archive checksum before decompression and can also compare the manifest
recipe key with an expected key supplied by CI. Local Make gets byte
verification for downloaded release artifacts, and CI gets an extra
guard against release-manifest or asset drift between resolve and
materialize.

Pass the expected class recipe key from
'.ci/prebuilt/materialize-artifacts.sh' when materializing a
'download-release' action. After the helper verifies the archive and
recipe key, the materializer decompresses the raw artifact and stamps
'.stamps/prebuilt-local' with the current recipe key. A mismatch fails
before stale bytes are marked as current.

Update '.ci/prebuilt/resolve-artifacts.sh' so release recipe keys are
accepted from manifests that contain archive checksum entries for the
class outputs. A manifest that lacks those entries is outside the
current release contract and resolves as stale, which drives CI toward
cache or source build.

Remove the unused 'release_manifest_available' output from the resolver
and the materializer's ignored-key list. Also document the bootstrap
boundary in code: forks and mirrors may bootstrap from source when
'prebuilt.sha1' returns HTTP 404, while network failures and other HTTP
errors remain fatal so external release-state problems stay visible.
Add 'docs/build-system-and-ci.md' as the build-system and CI overview.
The document follows the execution flow through local Make file targets,
explicit source builds, the shared artifact class contract, CI recipe
keys, resolver decisions, materialization, packaging, testing,
publishing, and fork/mirror bootstrap behavior.

Update 'README.md' to match the same model. The first-run instructions
describe missing guest artifacts as release downloads, local downloads
verify the archive checksum in 'prebuilt.sha1', and source rebuilds stay
explicit through 'make build-artifacts'. The artifact-build section
describes local Make behavior and CI recipe keys as separate
responsibilities.

Update 'docs/networking.md' after moving device smoke scripts under
'.ci/device-smoke'.
PR sysprog21#141 discussed a kernel panic where Linux saw 'vda' from
'virtio-blk' but still reported root device "". The manual run
used a DTB generated for the legacy initramfs path, so its
bootargs did not provide 'root=/dev/vda'. The command only
attached '-d ext4.img' and did not pass '-i rootfs.cpio'.

The default 'make check' paths already avoid this. External-root
builds generate bootargs with 'root=/dev/vda' and pass '-d ext4.img';
the fakeroot fallback and explicit 'ENABLE_EXTERNAL_ROOT=0' path pass
'-i rootfs.cpio' together with the initramfs DTB.

Keep custom DTBs outside these default 'minimal.dtb' checks. For
the built-in default-DTB cases, validate the option combination
before boot so manual invocations get an actionable 'semu' error
instead of reaching the guest VFS panic.
macOS runners can fail to materialize 'ext4.img' when the default
'cpio' is not compatible with the fakeroot environment used by
'rootfs_ext4.sh'. The build system already decides whether external
root boot is enabled, so the image helper should report missing tools
and let callers choose the cpio implementation instead of embedding
host-specific fallback policy.

Add 'ROOTFS_CPIO' as the Makefile and script interface, and pass it
through the fakeroot child with the other explicit environment inputs.
The helper now validates both 'ROOTFS_CPIO' and 'MKFS_EXT4', keeps cpio
extraction and mkfs in one fakeroot process, and checks that the
extracted rootfs contains executable '/sbin/init' before tolerating
non-fatal cpio warnings.

Use the interface in the macOS CI setup by installing Homebrew 'cpio'
and exporting its binary path. The default remains 'cpio' for normal
local builds.
@Mes0903 Mes0903 force-pushed the split-local-ci-prebuilt branch from fb64e4a to 184be80 Compare July 2, 2026 00:53
local inode

inode=$(ls -id "$BUILDROOT_LOCK_DIR" 2>/dev/null) || inode=missing
inode=${inode%% *}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the OS can recycle pid and reuse inodes, is there a risk that this might lead to subtle race conditions?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked this again and the concern is valid. This lock is used to serialize source builds that touch the shared buildroot/ tree. The outer artifact flow may request Image, rootfs.cpio, and test-tools.img separately, but all of those paths can use or mutate Buildroot state such as the checkout, config, host toolchain, and output directories. The lock is there so those sections do not run concurrently while still allowing Buildroot itself to use its own internal parallelism.

The difficult part is automatic stale-lock recovery. The current code uses the lock directory inode plus the recorded PID as a best-effort identity before removing a stale lock. That is not a strict guarantee because both inode numbers and PIDs can be reused. More importantly, with portable shell/filesystem operations we do not have an atomic "remove this lock only if it is still owned by the same observed owner" primitive. A separate read/compare followed by rm can still race with another process that removed the stale lock and then created a fresh one.

The stale-lock case we are trying to recover from should only happen when an artifact source build is interrupted after acquiring the lock, for example when the user kills the build, the shell exits unexpectedly, or a CI runner is terminated at the wrong time. That should be much less common than normal lock contention between concurrent artifact builds.

Given that, I plan to remove the best-effort automatic stale-lock deletion. The portable path will use an atomic symlink lock whose target contains an owner token, so acquiring the lock and publishing the owner identity happen in one operation. Normal release will remove the symlink only when the token still matches the current process. If another process observes a stale owner, it will print an actionable diagnostic asking the user/CI job to remove the lock after confirming that no artifact source build is still running, instead of deleting the lock automatically.

This keeps the common path cross-platform and avoids relying on reusable PID/inode identity for correctness.

@visitorckw

Copy link
Copy Markdown
Collaborator

This is a huge.
I'm curious if we might be able to leverage some existing standard gitHub actions like actions/cache or actions/upload-artifact to simplify parts of this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants