Skip to content

Delay AMO rd write until store succeeds#141

Open
shengwen-tw wants to merge 1 commit into
sysprog21:masterfrom
shengwen-tw:fix-amo
Open

Delay AMO rd write until store succeeds#141
shengwen-tw wants to merge 1 commit into
sysprog21:masterfrom
shengwen-tw:fix-amo

Conversation

@shengwen-tw

@shengwen-tw shengwen-tw commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Overview

Do not commit the AMO destination register before the store succeeds.

AMO_OP wrote rd before confirming that mmu_store() succeeded. If the store raised a fault, such as a COW/page fault, the destination register could be committed even though the guest kernel would handle the fault and retry the same instruction.

This patch delays the rd write until after mmu_store() returns and vm->error is known to be clear.

Problem Statement

Guest GDB can hang for:

gdb -q -batch -ex "set debuginfod enabled off" -ex run --args /bin/true

AI-agent investigation reduced the hang to GDB's default worker-thread path and identified that glibc's malloc at fork unlock path was failing to release its lock:

amoswap.w.rl a5,a5,(a4)

At that point, a4 points at glibc malloc's list_lock, a5 is the value to store (0, meaning unlocked), and the current lock value is 1 (locked). The instruction swaps the value in a5 into the lock and returns the old lock value back in a5.

before: list_lock = 1, a5 = 0

On the broken path, single-stepping showed that a5 still receives the old lock value, but the store did not clear the lock:

broken after: list_lock = 1, a5 = 1
(normal after: list_lock = 0, a5 = 1)

The underlying issue is the AMO implementation:

#define AMO_OP(STORED_EXPR)                                   \
...
set_dest_idx(vm, decoded_rd(decoded), value);
mmu_store(vm, addr, RV_MEM_SW, (STORED_EXPR), false);
if (vm->error)
    return;
...

mmu_store() can fault, for example on a COW/page fault after fork(). Therefore, the old implementation can corrupt that register before the fault is handled

Fix

Compute the store value first, perform the store, check for an error, and only then commit rd:

uint32_t stored = (STORED_EXPR);
mmu_store(vm, addr, RV_MEM_SW, stored, false);
if (vm->error)
    return;
set_dest_idx(vm, decoded_rd(decoded), value);

Testing

scripts/buildroot-enable-gdb.sh is a testing helper that appends the target GDB options needed to Buildroot config.

#!/usr/bin/env bash
set -euo pipefail

CONFIG_FILE="${1:-configs/buildroot.config}"

cat >>"$CONFIG_FILE" <<'EOF'

# Target-side GDB reproduction support.
BR2_TOOLCHAIN_BUILDROOT_CXX=y
BR2_INSTALL_LIBSTDCPP=y
BR2_ENABLE_DEBUG=y
# BR2_STRIP_strip is not set
BR2_PACKAGE_GDB=y
BR2_PACKAGE_GDB_DEBUGGER=y
BR2_PACKAGE_GDB_SERVER=y
BR2_PACKAGE_HOST_GDB=y
EOF

echo "Appended target-side GDB reproduction options to $CONFIG_FILE"

Build a GDB-enabled guest:

chmod +x ./scripts/buildroot-enable-gdb.sh
./scripts/buildroot-enable-gdb.sh
./scripts/build-image.sh --all --no-ext4
./scripts/rootfs_ext4.sh rootfs.cpio ext4.img 256
make semu -j$(nproc)

Boot:

./semu -k Image -c 1 -b minimal.dtb -H -d ext4.img

Inside the guest:

gdb -q -batch -ex "set debuginfod enabled off" -ex run --args /bin/true
echo GDB_TRUE_EXIT:$?

Expected behavior:

Before this patch: hangs
After this patch: exits normally with GDB_TRUE_EXIT:0


Summary by cubic

Delay the AMO destination register write until after mmu_store() succeeds to prevent committing rd on a fault. Fixes a guest GDB hang triggered by amoswap.w.rl in glibc’s malloc lock path.

  • Bug Fixes
    • In AMO_OP, compute the store value, call mmu_store, check vm->error, then write rd.
    • Prevents rd corruption on COW/page faults, restoring correct swap semantics and normal GDB exit.

Written for commit deadf03. Summary will update on new commits.

Review in cubic

@shengwen-tw shengwen-tw marked this pull request as ready for review June 23, 2026 12:24
@shengwen-tw

Copy link
Copy Markdown
Collaborator Author

@cubic-dev-ai

@cubic-dev-ai

cubic-dev-ai Bot commented Jun 23, 2026

Copy link
Copy Markdown

@cubic-dev-ai

@shengwen-tw I have started the AI code review. It will take a few minutes to complete.

@shengwen-tw shengwen-tw requested a review from jserv June 23, 2026 12:26

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Re-trigger cubic

Do not commit the AMO destination register before the store succeeds.

`AMO_OP` wrote `rd` before confirming that `mmu_store()` succeeded. If the
store raised a fault, such as a COW/page fault, the destination register could
be committed even though the guest kernel would handle the fault and retry the
same instruction.

This patch delays the `rd` write until after `mmu_store()` returns and
`vm->error` is known to be clear.
@Cuda-Chen

Copy link
Copy Markdown
Collaborator

Hi @shengwen-tw ,

I got a kernel panic when executing ./semu -k Image -c 1 -b minimal.dtb -H -d ext4.img:

$ ./semu -k Image -c 1 -b minimal.dtb -H -d ext4.img
...
[    0.023338] /dev/root: Can't open blockdev
[    0.023341] VFS: Cannot open root device "" or unknown-block(0,0): error -6
[    0.023344] Please append a correct "root=" boot option; here are the available partitions:
[    0.023347] fe00          262144 vda 
[    0.023348]  driver: virtio_blk
[    0.023350] List of all bdev filesystems:
[    0.023352]  ext3
[    0.023353]  ext4
[    0.023354]  fuseblk
[    0.023356] 
[    0.023358] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[    0.023361] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.94 #1
[    0.023364] Hardware name: semu (DT)
[    0.023366] Call Trace:
[    0.023368] [<c0004aa4>] dump_backtrace+0x2c/0x3c
[    0.023371] [<c04794ac>] show_stack+0x40/0x54
[    0.023374] [<c047efdc>] dump_stack_lvl+0x74/0xa8
[    0.023377] [<c047f02c>] dump_stack+0x1c/0x2c
[    0.023380] [<c047985c>] panic+0x128/0x368
[    0.023383] [<c0488cdc>] mount_root_generic+0x214/0x2a8
[    0.023386] [<c0488ddc>] mount_root+0x6c/0x1d4
[    0.023390] [<c0489148>] prepare_namespace+0x204/0x258
[    0.023393] [<c04885a0>] kernel_init_freeable+0x1dc/0x220
[    0.023397] [<c04803ac>] kernel_init+0x24/0x120
[    0.023400] [<c0487294>] ret_from_exception_end+0x14/0x20
[    0.023404] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]---

@Mes0903

Mes0903 commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Hi @Cuda-Chen ,

I checked your reply locally. I think the panic is caused by a DTB/runtime boot-mode mismatch, not by the AMO change in this PR.

The key line is:

VFS: Cannot open root device "" or unknown-block(0,0)

The same log also lists vda from virtio_blk, so the disk is visible to the kernel. The missing part is a root= argument in /chosen/bootargs.

I reproduced the same class of failure by taking a known-good minimal.dtb, removing root=/dev/vda from /chosen/bootargs, and running:

timeout 30 ./semu -k ../blob/Image -c 1 -b /tmp/semu-minimal-external-no-root-quiet.dtb -H -d ../blob/test-tools.img

That reaches the same VFS panic. With the normal external-root DTB:

fdtget -t s minimal.dtb /chosen bootargs

I get:

earlycon console=ttyS0 root=/dev/vda rw rootwait quiet loglevel=3 lpj=260000

and the same launch path boots to the Buildroot login prompt.

So the likely local condition is that minimal.dtb was generated for the legacy/initramfs path, for example with ENABLE_EXTERNAL_ROOT=0 or after the fakeroot fallback, or that it is stale from an older build. In that mode the DTB needs -i rootfs.cpio; running it with only -d ext4.img leaves the kernel with root device "".

Maybe we can try to improve the behavior in #140.

Mes0903 added a commit to Mes0903/semu that referenced this pull request Jun 26, 2026
PR sysprog21#141 discussed a kernel panic where Linux saw 'vda' from
'virtio-blk' but still reported root device "". The manual run
used a DTB generated for the legacy initramfs path, so its
bootargs did not provide 'root=/dev/vda'. The command only
attached '-d ext4.img' and did not pass '-i rootfs.cpio'.

The default 'make check' paths already avoid this. External-root
builds generate bootargs with 'root=/dev/vda' and pass '-d ext4.img';
the fakeroot fallback and explicit 'ENABLE_EXTERNAL_ROOT=0' path pass
'-i rootfs.cpio' together with the initramfs DTB.

Keep custom DTBs outside these default 'minimal.dtb' checks. For
the built-in default-DTB cases, validate the option combination
before boot so manual invocations get an actionable 'semu' error
instead of reaching the guest VFS panic.
Mes0903 added a commit to Mes0903/semu that referenced this pull request Jun 29, 2026
PR sysprog21#141 discussed a kernel panic where Linux saw 'vda' from
'virtio-blk' but still reported root device "". The manual run
used a DTB generated for the legacy initramfs path, so its
bootargs did not provide 'root=/dev/vda'. The command only
attached '-d ext4.img' and did not pass '-i rootfs.cpio'.

The default 'make check' paths already avoid this. External-root
builds generate bootargs with 'root=/dev/vda' and pass '-d ext4.img';
the fakeroot fallback and explicit 'ENABLE_EXTERNAL_ROOT=0' path pass
'-i rootfs.cpio' together with the initramfs DTB.

Keep custom DTBs outside these default 'minimal.dtb' checks. For
the built-in default-DTB cases, validate the option combination
before boot so manual invocations get an actionable 'semu' error
instead of reaching the guest VFS panic.
@Cuda-Chen

Copy link
Copy Markdown
Collaborator

Hi @Mes0903 ,

Thanks for your assistance.
After building semu with ENABLE_EXTERNAL_ROOT=0 I can launch semu.

For @shengwen-tw ,
I confirm gdb will exit with GDB_TRUE_EXIT:0.

@Cuda-Chen Cuda-Chen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM.

Mes0903 added a commit to Mes0903/semu that referenced this pull request Jul 2, 2026
PR sysprog21#141 discussed a kernel panic where Linux saw 'vda' from
'virtio-blk' but still reported root device "". The manual run
used a DTB generated for the legacy initramfs path, so its
bootargs did not provide 'root=/dev/vda'. The command only
attached '-d ext4.img' and did not pass '-i rootfs.cpio'.

The default 'make check' paths already avoid this. External-root
builds generate bootargs with 'root=/dev/vda' and pass '-d ext4.img';
the fakeroot fallback and explicit 'ENABLE_EXTERNAL_ROOT=0' path pass
'-i rootfs.cpio' together with the initramfs DTB.

Keep custom DTBs outside these default 'minimal.dtb' checks. For
the built-in default-DTB cases, validate the option combination
before boot so manual invocations get an actionable 'semu' error
instead of reaching the guest VFS panic.
Mes0903 added a commit to Mes0903/semu that referenced this pull request Jul 2, 2026
PR sysprog21#141 discussed a kernel panic where Linux saw 'vda' from
'virtio-blk' but still reported root device "". The manual run
used a DTB generated for the legacy initramfs path, so its
bootargs did not provide 'root=/dev/vda'. The command only
attached '-d ext4.img' and did not pass '-i rootfs.cpio'.

The default 'make check' paths already avoid this. External-root
builds generate bootargs with 'root=/dev/vda' and pass '-d ext4.img';
the fakeroot fallback and explicit 'ENABLE_EXTERNAL_ROOT=0' path pass
'-i rootfs.cpio' together with the initramfs DTB.

Keep custom DTBs outside these default 'minimal.dtb' checks. For
the built-in default-DTB cases, validate the option combination
before boot so manual invocations get an actionable 'semu' error
instead of reaching the guest VFS panic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants