fix(packaged): reclaim stale namespace sidecars before launch (#4441)#4531
fix(packaged): reclaim stale namespace sidecars before launch (#4441)#4531YOMXXX wants to merge 1 commit into
Conversation
…o#4441) An older packaged runtime can leave a still-alive daemon or web sidecar bound to /tmp/open-design/ipc/<namespace>/<app>.sock. Because the leftover process answers the new sidecar's stale-socket probe, the probe treats the socket as healthy and never unlinks it, so the new sidecar dies with EADDRINUSE while the launcher times out on a generic status wait that names the wrong process. - Add reclaimStaleNamespaceSidecars(): before spawning, stop leftover same-namespace daemon/web sidecars (excluding the current process tree) and clear their IPC sockets, so prepareIpcPath can then unlink the now-dead socket cleanly. Reuses the platform stamp-discovery primitives. - Parametrize waitForStatus with an app label and watch the web child, so an EADDRINUSE web sidecar surfaces its real failure and log path immediately instead of a 35s timeout labelled "daemon". - Cover both behaviours with packaged unit tests.
|
Hi @YOMXXX! Thanks for digging into this — the write-up around the stale live-sidecar vs dead-socket gap made the startup failure mode very easy to follow. I've queued the PR through the normal reviewer path and linked it back to #4441 so the bug trail stays in one place. 💡 To drive this PR to merge hands-free, paste this to your AI coding agent (Claude Code / Codex / opencode / Cursor …): |
PerishCode
left a comment
There was a problem hiding this comment.
@YOMXXX I reviewed the changed ranges in apps/packaged/src/sidecars.ts and apps/packaged/tests/sidecars.test.ts. The reclaim path stays on the shared sidecar/platform primitives, scopes termination to same-namespace packaged daemon/web sidecars while protecting the current process tree, and the new tests cover the stale sidecar reclaim, namespace isolation, self-tree exclusion, and web fast-fail diagnostic behavior.
I could not rerun the local package commands in this prepared checkout because node_modules is missing (vitest, tsc, and tsx were unavailable), but the live PR checks show the workspace/static gates and unit-test job passing for this head. Thanks for tightening this startup failure mode and making the diagnostics more direct.
Fixes #4441
Why
I hit this while exercising packaged Nightly upgrade/reinstall flows: the app appeared to crash on launch, and the desktop log only showed a generic
timed out waiting for sidecar statusagainstweb.sock. Digging in, the real cause was a still-alive web sidecar from the previous build (0.10.1) still bound to/tmp/open-design/ipc/release-nightly/web.sock.The pain: the existing stale-socket cleanup (
prepareIpcPath/staleUnixSocketExists) only handles a socket left behind by a dead process — it probes with a connection and, when a leftover process is still alive and answers, judges the socket "healthy" and never unlinks it. The new sidecar then dies withEADDRINUSE, and because the webwaitForStatuscall passed no child watch, it sat through the full 35s timeout and reported the wrong process. This is the P1/risk-high startup robustness gap in #4441: a live older-version sidecar is semantically stale for the namespace but was never reclaimed.What users will see
Upgrading or reinstalling a packaged build no longer fails to open when an older runtime left a sidecar running. The launcher now reclaims the namespace on startup (stops the leftover daemon/web sidecars and clears their sockets) before spawning fresh ones. If a sidecar still can't start, the desktop log now names the failing process and points at its log instead of a generic 35s timeout.
Surface area
Screenshots
N/A — no UI.
Bug fix verification
apps/packaged/tests/sidecars.test.tsreclaimStaleNamespaceSidecars > stops leftover same-namespace daemon and web sidecars and clears their sockets(+ different-namespace and self-tree-exclusion cases)waitForStatus app label > names the web sidecar when its child exits before reporting statusmainand green on this branch? yes — red asreclaimStaleNamespaceSidecars is not a functionand as the exit error namingdaemoninstead ofweb; green after the fix./tmp/open-design/ipc/<namespace>/web.sock, then launch the new build — it should reclaim and start instead of timing out.Validation
pnpm --filter @open-design/packaged test→tests/sidecars.test.ts28 passed (the 4 new specs RED→GREEN).tsc -p tsconfig.json --noEmit/tsconfig.tests.json→ no errors insidecars.ts.