feat(platform-wallet)!: shared ThreadRegistry for coordinator lifecycle + shutdown UAF/data-loss fixes#3954
feat(platform-wallet)!: shared ThreadRegistry for coordinator lifecycle + shutdown UAF/data-loss fixes#3954Claudius-Maginificent wants to merge 318 commits into
Conversation
…tBytes, never raw bytes (CMT-002)
Add `SecretStore` as the public, never-leaking secrets entry point.
`get` yields a zeroizing `SecretBytes` (a raw `Vec<u8>` never crosses the
boundary); `set` takes `&SecretBytes` so callers cannot pass an unwrapped
buffer. The `File` arm delegates to new inherent typed methods on
`EncryptedFileStore`, returning `FileStoreError` losslessly so
`WrongPassphrase` vs `Corruption` vs `Busy` stay distinct. The `Os` arm
projects `keyring_core::Error` best-effort into the new
`FileStoreError::OsKeyring { kind }` payload-free discriminant. The
internal `CredentialApi`/`CredentialStoreApi` SPI impls are unchanged;
`SecretStore` wraps them.
Docs (SECRETS.md, lib.rs, secrets/mod.rs) present `SecretStore` as the
consumer front door with keyring_core as the internal SPI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ed-path error distinction Drop the boxed-marker recovery in `From<FileStoreError> for keyring_core::Error`: the SPI seam is now lossy and string-only, with no `Box<dyn>` round-trip. The lossless `WrongPassphrase`/`Corruption`/`Busy` distinction lives on the typed `SecretStore` path. Repoint the in-crate SPI tests that recovered the typed error through `NoStorageAccess` onto the typed path, asserting only the lossy projection at the seam. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ore NoStorageAccess for lossless SPI recovery Revert the string-only `From<FileStoreError> for keyring_core::Error`: `WrongPassphrase` / `Busy` now box the single typed `FileStoreError` itself into `NoStorageAccess`, so external keyring_core-SPI consumers recover the variant losslessly via `source().downcast_ref::<FileStoreError>()`. No second type is reintroduced (FileStoreFailure stays deleted), satisfying the original error.rs objection. The `BadStoreFormat` group has no box slot, so it carries only a secret-free string and stays fully typed on the `SecretStore` path. Seam tests assert downcast recovery and the secret-free BadStoreFormat rendering. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…re __secrets-test-helpers (CMT-008) The in-RAM MemoryCredentialStore test double had no consumer outside its own module. Its behaviors (label rejection, namespacing, zeroizing storage) are already covered by the tempdir-backed EncryptedFileStore tests, so the store and its dedicated `__secrets-test-helpers` feature are retired. The dev-dependency self-reference uses `__test-helpers`, not the secrets one, so nothing else needs it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pose_secret guard scan (CMT-012/010)
CMT-012: `hex::decode_to_slice` accepts uppercase, so `parse_service`
now rejects any A-F before decoding — the service string is always
constructed lowercase via `WalletId::to_hex`, making lowercase a clean
parse invariant. Adds a test that an uppercase-hex service is rejected
and the lowercase form of the same bytes is accepted.
CMT-010: the expose_secret leak guard joined only a 2-line window, so a
3+-line `tracing::…(field = expose_secret(), …)` call slipped through.
The scan now groups whole statements (concatenating until parens
balance and a `;`/`{` is seen) so the sink and `expose_secret` land in
one window. Adds a non-vacuous planted 3-line case the widened scan
catches and the old 2-line window would have missed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ader (CMT-013/014) CMT-013: removes internal finding-ID and rework narration (SEC-00N, EDIT-N, CMT-NNN, "trimmed fork of", "the defect: used to…") from comments across src/secrets/, keeping the present-state behavior and requirement-spec rationale. Comments describe what IS, not the journey. CMT-014: removes the embedded MIT license-text block atop secret.rs (first-party, same org, matching license) and replaces the module header with a one-line doc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… failures (Display-only, secret-free)
Library-idiom + security-event logging only; no blanket error! at
routine return sites.
- Swallowed mlock failures in secret.rs (3 sites) move from debug! to
warn!: they are .ok()-swallowed and caller-invisible, yet
security-relevant (the secret may be swappable to disk or land in a
core dump). Display `{e}` only.
- Corruption/tamper detected in get()/rekey() (post-verify AEAD tag
failure → Corruption): error! with the non-secret wallet-id/label,
Display only, never the secret or the raw keyring error.
- Vault write failure in write_vault: warn! with the io error's
Display; paths are caller-supplied non-secret.
Never `{:?}` a keyring_core::Error and never log a secret wrapper; all
new lines use `%` Display, so the EDIT-2 no-debug-format guard still
passes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ShieldedChangeSet, SubwalletId, and ShieldedNote now derive serde::Serialize/Deserialize behind the `serde` feature. Without them the serde+shielded feature combo failed to compile: the serde-derived PlatformWalletChangeSet carries a cfg(shielded) Option<ShieldedChangeSet> field whose type had no serde impl. This is the combo built by the macOS CI lane, which was red on this PR head. The leaf types are plain POD ([u8;32]/u32/u64/bool/Vec<u8>), and the shielded delta is meant to be persisted for cold-start rehydration, so deriving (not #[serde(skip)]) is the correct fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…_secrets-test-helpers references (QA-002) The `__secrets-test-helpers` feature and its `secrets::MemoryCredentialStore` in-RAM test double were removed in the keyring_core SPI rework. Remove the stale feature row from the README Cargo features table and replace the obsolete backend bullet in SECRETS.md with the current test pattern: a tempdir-backed `EncryptedFileStore` (or `SecretStore::file`) constructed via `tempfile::tempdir()`, available under the default `secrets` feature with no special flag required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`cargo clippy -p platform-wallet-storage --no-default-features --features sqlite,cli` failed with three dead-code errors: `load_state`, `decode_pair_key`, and the non-test `ContactsRecords` arm. On this PR standalone their only consumer is the `__test-helpers` wrapper `load_state_for_test`; the production reader that will consume them ships with the separate rehydration feature. Gate the reader, its key decoder, `ContactsRecords`, and the imports they pull behind `__test-helpers` (collapsing the duplicate `ContactsRecords` definitions into one). The off-state build now goes clean; the default `--all-targets` build (which pulls test-helpers via dev-dep) and the prod `sqlite,cli,secrets` combo stay clean. Un-gate when the rehydration feature adds the production consumer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…llapse V002 (CMT-001/006) Rewrite V001 as hand-written CREATE TABLE ... PRIMARY KEY ... FOREIGN KEY SQL with native ON DELETE CASCADE, run through refinery. Drop the barrel dependency (it can't emit composite FK clauses for sqlite3) and delete the FK-emulation triggers plus the whole of V002 — native FK enforces INSERT / UPDATE re-parenting / DELETE-cascade across all 18 child tables, including the composite identity_keys/dashpay_profiles -> identities FK. Add a single open_conn choke-point (src/sqlite/conn.rs) that every connection-open site routes through; for read-write handles it enables PRAGMA foreign_keys=ON and reads it back, hard-erroring with the new WalletStorageError::ForeignKeysNotEnforced if the linked SQLite ignores the pragma (R-LOW: no compile-time knob can force it). R2: core_utxos.spent_in_txid clearing to NULL on transaction delete stays a single trigger — a native composite ON DELETE SET NULL nulls every FK column and wallet_id is NOT NULL, so SQLite would reject the delete (1299). The single-column trigger preserves the lazy semantics. Also harden wallet_metadata.birth_height read against u32 truncation (CMT-014) and the backup destination against TOCTOU via create_new. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…count UTXOs (CMT-003/011) core_derived_addresses gains an account_index column, populated from the derived-address path via the now-pub(crate) accounts::account_index helper. apply() writes derived addresses BEFORE UTXOs in the same transaction so the UTXO writer can resolve each UTXO's owning account by its rendered address (SELECT account_index FROM core_derived_addresses WHERE wallet_id=? AND address=?), replacing the hardcoded 0. A miss warns and defaults to 0. The spent-only synthetic path (CMT-011) uses the same lookup. Sync-height reads now u32::try_from instead of `as u32` (CMT-012). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…entity-key writes (CMT-005/004) platform_addrs::apply rejects an entry whose wallet_id differs from the flush scope with the typed WalletIdMismatch before the INSERT (the native FK now also backstops it, but the typed error pinpoints the mismatch). identity_keys::apply rejects an upsert whose entry (identity_id, key_id, wallet_id) disagrees with its map key / flush scope (IdentityKeyEntryMismatch), so the typed columns and the serialized blob can never describe different rows on disk. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…peek errors (CMT-007/009/010) Remove the delete-wallet subcommand (CLI surface only; the library delete_wallet API stays). peek_schema_version now returns Result and distinguishes a fresh DB (no history table -> Ok(None)) from a transient open/query failure (Err), so `migrate` can no longer print a wrong `applied: N` from a swallowed error. run_backup defers refuse-to-overwrite to backup_to's typed BackupDestinationExists instead of a duplicate is_file() guard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…i-account, overflow guards) New tests/sqlite_hardening_3625.rs proves: native FK rejects an orphan child and an identity_keys row with no identities parent; mixed-wallet platform-address write fails typed (CMT-005); identity-key entry/key-vs -blob mismatch is rejected (CMT-004); multi-account UTXOs bucket to their real account and an undeclared address defaults to 0 (CMT-003/011); the birth_height and sync_height overflow cases error rather than truncate (CMT-012/014); a compaction-marker-only wallet survives load (CMT-002). Update existing tests for the rewritten schema: TC-027 derived-address insert carries account_index; the error-classification exhaustiveness table covers the three new variants; TC-072 asserts the delete-wallet subcommand is gone; FK test header drops the trigger wording. A conn.rs unit test exercises the foreign-key read-back assertion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ope conn doc (QA-001/004) TC-048 now explicitly asserts the UTXO row survives the transaction delete (COUNT==1) with wallet_id/value/account_index preserved, in addition to spent_in_txid being NULL — so a future change turning the single-column trigger into a cascading DELETE fails loudly instead of passing implicitly. Scope the conn.rs choke-point doc to library connection paths and note the CLI's read-only peek_schema_version probe opens directly (no mutations, and open_conn is pub(crate) so the separate bin target can't reach it). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ency CMT-003's pure-SQL account_index lookup removed the last reference to key-wallet-manager; cargo machete (macOS CI) flagged it as unused. No `.rs` file references key_wallet_manager under any cfg/feature, and the optional dep carried no extra feature activation, so removal is clean — not papered over with a machete ignore. Drop the dependency line and its `dep:key-wallet-manager` wiring from the sqlite feature. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cOS path symmetry `backup_to` canonicalizes its returned path; on macOS the temp dir lives under `/var` (symlink to `/private/var`), so the canonical `written` path no longer prefixes the un-canonicalized `out_dir` and the `starts_with` assertion failed (Linux has no such symlink, hence green locally). Canonicalize the expected dir before the check, mirroring TC-032. TC-031 is the only affected assertion — TC-032 was already symmetric, and TC-035 only feeds the canonical path back into restore_from without comparing it to a temp path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…L/SHM perms (CMT-001/003/004) CMT-001 (BLOCKING data-loss): restore_from() unlinked <dest>-wal / -shm BEFORE staging + integrity / schema / max-version checks. A failed gate left the main DB intact (NamedTempFile drop) but the sidecars were gone — un-checkpointed pages lost. Move the unlink to immediately BEFORE tmp.persist() so both succeed atomically or neither runs. CMT-003 (HIGH security): the create_new() call in run_to() created backup files at the umask-default mode before apply_secure_permissions chmodded them to 0o600. Set OpenOptionsExt::mode(0o600) under cfg(unix) so the file is never world/group readable, even briefly. Keep the post-open chmod as defense-in-depth (Windows no-op, re-tightens). CMT-004 (HIGH security): apply_secure_permissions only chmodded the main DB; WAL/SHM siblings inherited the process umask. Extend the same helper to chmod <path>-wal / <path>-shm to 0o600 when present (silent skip when absent, Windows no-op). One function, idempotent — matches the user preference for a single API surface. Regressions: - tests/sqlite_restore_staged_validation::rejected_restore_leaves_wal_shm_siblings_intact - tests/sqlite_permissions::wal_and_shm_sidecars_are_chmodded_0o600 Co-Authored-By: Claudius the Magnificent (1M context) <noreply@anthropic.com>
…ore_from (CMT-010) restore_from() was streaming the entire source DB into a NamedTempFile in the destination's parent dir BEFORE running the schema-history / max-version gate. A caller passing a valid-but-huge incompatible DB (CLI / FFI / UI import) filled the wallet partition before being rejected. Add cheap source-side schema-history presence + MAX(version) checks against the source handle that was already opened for integrity_check — failing fast saves the disk. Keep the staged-copy gate as the TOCTOU-safe final check: a concurrent swap during the restore window still gets rejected because the second check binds to the bytes actually persisted. Factored shared logic into migrations::max_supported_version() + migrations::assert_schema_version_supported() so the open()-path gate (CMT-005, next commit) reuses the same code. Regression: tests/sqlite_restore_staged_validation::forward_version_rejected_before_staging Co-Authored-By: Claudius the Magnificent (1M context) <noreply@anthropic.com>
… failure (CMT-002 data-loss) delete_wallet_inner drained the per-wallet buffered changeset BEFORE the existence check / auto-backup / DELETE transaction, but never restored it on any pre-commit failure. The operator's delete correctly reported the backup or SQL error, yet the wallet's pending writes vanished from the buffer even though no DELETE actually committed. Mirror flush_inner's take/restore discipline: hold the drained changeset in a Cell<Option<...>>, on success consume it AFTER tx.commit(), on any error path restore it into the buffer via self.buffer.restore(wallet_id, cs). If the restore itself fails (e.g. poisoned buffer mutex), log the lost changeset at error level — the delete error still surfaces (it's the primary cause). Regression: tests/sqlite_delete_wallet::delete_wallet_restores_buffer_on_backup_failure points auto_backup_dir at an unwritable path, primes a buffered changeset, asserts (a) delete returns AutoBackupDirUnwritable and (b) the buffered writes survive — a follow-up flush lands them. Co-Authored-By: Claudius the Magnificent (1M context) <noreply@anthropic.com>
…ic with restore_from (CMT-005) SqlitePersister::open ran the migration probe (count_pending) and refinery::run, but never compared the on-disk MAX(version) against the highest embedded migration. An older binary opening a DB produced by a newer release saw pending_count == 0 and silently proceeded to decode forward-schema blob columns — the migration version IS the entire compatibility boundary for this crate (per blob.rs:4-6). Add the same SchemaVersionUnsupported gate restore_from enforces, calling the shared migrations::assert_schema_version_supported helper introduced in the CMT-010 commit. Sites are now symmetric. Regression: tests/sqlite_open_version_gate::open_rejects_forward_schema_version forges a version=1_000_000 row, asserts the second open fails with SchemaVersionUnsupported. Co-Authored-By: Claudius the Magnificent (1M context) <noreply@anthropic.com>
…ent stores (CMT-008) delete_wallet_inner previously released and re-acquired the connection Mutex three times (existence-check / auto-backup / DELETE-tx). A parallel store(wallet_id, cs) could slip in between any pair of those scopes — in Immediate mode the racing flush landed AFTER the DELETE commit (wallet reappears), in Manual mode a buffered store survived the delete and was persisted by the next commit_writes() against a freshly-recreated parent. Approach (a): acquire self.conn()? ONCE at the top of delete_wallet_inner and hold it across the full pipeline so Immediate stores block on flush_inner's own self.conn(). For Manual mode (where store doesn't take conn at all), do a post-commit second drain of the buffer to discard any racing-buffered changeset — the wallet is being removed, those writes are intentionally void. run_auto_backup already accepted &Connection so no signature changes were required. Regression: tests/sqlite_delete_wallet::concurrent_store_does_not_resurrect_deleted_wallet spawns a worker hammering store() while the main thread deletes, commits any remaining writes, asserts every PER_WALLET_TABLES row count for that wallet_id is zero. Co-Authored-By: Claudius the Magnificent (1M context) <noreply@anthropic.com>
…e / typed AssetLockEntryMismatch (CMT-006/007)
CMT-006: blob::decode used bincode's NoLimit config, so an attacker-
controlled length prefix inside a Vec/String/map could drive the
allocator BEFORE the trailing-byte gate ever fired — a crafted backup
that passed restore_from's integrity check could OOM the host on the
next load(). Switch encode + decode to .with_limit::<16 MiB>() (well
above any legitimate per-row payload) and surface the cap-exceeded
case as a NEW typed variant `WalletStorageError::BlobTooLarge {
len_bytes, limit_bytes }` so operators can distinguish hostile/
corrupt oversize from a structural decode failure. Generic decode
failures (trailing bytes, schema mismatch) still surface as the
existing `BlobDecode` per user preference for precise variants.
CMT-007: asset_locks::decode_row decoded the typed outpoint from
op_bytes but then constructed TrackedAssetLock using the BLOB's
out_point + account_index. A torn write or restored corruption that
diverged blob-vs-typed would silently mis-bucket the lock under the
wrong account. Add a cross-check symmetric with IdentityKeyEntryMismatch
under a NEW typed variant `WalletStorageError::AssetLockEntryMismatch {
typed_outpoint, blob_outpoint, typed_account_index, blob_account_index }`.
is_transient() + error_kind_str() updated for both new variants — the
match remains wildcard-free per the existing #[non_exhaustive]-rejection
note in error.rs.
Regressions:
- src/sqlite/schema/blob.rs::tests::decode_rejects_oversize_blob_with_blob_too_large
- tests/sqlite_hardening_3625::asset_lock_typed_vs_blob_mismatch_rejected
Co-Authored-By: Claudius the Magnificent (1M context) <noreply@anthropic.com>
…inner blob (CMT-009)
IdentityKeyWire::into_entry decoded public_key_bincode with
bincode::decode_from_slice but bound the consumed count to `_`. A
corrupt or forward-incompatible inner payload whose IdentityPublicKey
prefix is valid would deserialize successfully even with garbage past
the typed length — leaving identity_keys weaker than every other blob
path. The outer blob::decode already enforces the same gate.
Capture consumed and surface BlobDecode("unexpected trailing bytes in
identity_keys.public_key_bincode") on mismatch (using the existing
blob_decode helper per user preference — trailing-bytes is the
canonical BlobDecode case, not a typed-variant case).
Regression: src/sqlite/schema/identity_keys::tests::into_entry_rejects_trailing_bytes_in_public_key_bincode
Co-Authored-By: Claudius the Magnificent (1M context) <noreply@anthropic.com>
…te (CMT-011) list_active was byte-for-byte identical to load_state — same SQL, same decode loop. Per the docstring's own admission, every persisted status is considered active (consumed locks leave via AssetLockChangeSet::removed), so the missing WHERE clause was intentional but made the two functions semantically indistinguishable. Option (a) per user preference: drop list_active, migrate the one caller (tests/sqlite_persist_roundtrip.rs:377) to load_state, refresh load_state's docstring to subsume the consumed-locks comment, and verify zero remaining references with grep. Also patches the wildcard-free match in tests/sqlite_error_classification.rs to cover the new AssetLockEntryMismatch / BlobTooLarge variants added in the prior CMT-006/007 commit (the exhaustiveness gate is what this test exists to enforce). Co-Authored-By: Claudius the Magnificent (1M context) <noreply@anthropic.com>
…MT-012) The smoke test previously accepted any non-success via assert!(!out.status.success(), ...) so a runtime failure unrelated to unknown-subcommand handling (panic during clap setup, OOM, signal kill) would still pass. Assert the explicit clap usage code Some(2) before checking stderr content — costs nothing, tightens the contract. Co-Authored-By: Claudius the Magnificent (1M context) <noreply@anthropic.com>
- sqlite_delete_wallet.rs: drop a useless std::fs::Permissions::from(...). - sqlite_permissions.rs: silence clippy::field_reassign_with_default (consistent with the existing test files' top-level allow). Co-Authored-By: Claudius the Magnificent (1M context) <noreply@anthropic.com>
…atch_only
load_from_persistor now rebuilds every persisted wallet watch-only from
its keyless AccountRegistrationEntry manifest (Wallet::new_watch_only)
and applies the keyless core-state projection on top. No seed material
is touched on the load path: signing keys are derived on demand later
through the MnemonicResolverHandle sign entrypoints, which carry the
fail-closed wrong-seed gate themselves.
Drops the SeedProvider port + WalletSecret/SecretPhrase/SecretSeed
payloads (and the storage CredentialStoreSeedProvider adapter that fed
them) — load no longer needs the abstraction. WrongSeedForDatabase
stays on PlatformWalletError for the sign-path gate. RT suite reshapes
to RT-WO (watch-only round-trip) + RT-Corrupt (per-row decode skip with
SkipReason::CorruptPersistedRow{kind: CorruptKind::MissingManifest}) +
RT-Z (no key material in any LoadOutcome / SkipReason surface).
apply_persisted_core_state and its F2/F3/F4 fixes are unchanged.
AR-7 residual risk on the load path is eliminated (no Wallet of a
signing type is constructed during load, so its Debug-leak surface is
gone from this path).
Co-Authored-By: Claudius the Magnificent (1M context) <noreply@anthropic.com>
…iveness, like clear_shielded/destroy shielded_sync_stop returned Success on status.is_clean() alone, ignoring a prior-generation shielded thread still parked alive as an orphan — asymmetric with clear_shielded/destroy and a misleading contract (the orphan still holds the host callback context). No live UAF today since Swift always does stop->destroy, but Success should imply no live shielded worker/orphan. Add manager::shielded_worker_alive() (the same shielded-scoped any_alive_for gate clear_shielded consults) and have shielded_sync_stop return ErrorShutdownIncomplete when a parked orphan survives a clean drain. FFI ABI unchanged (same return-code semantics); docstring updated so Success accurately implies no live shielded worker/orphan. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0157yd3YvWeyckhfQivS9gf7
…cing gate continuously SEC-001: clear_shielded's in-flight-pass drain was unbounded and the FFI clear bridge a bare block_on, so a heavy direct pass could hang the host (ANR). Bound it with a SHUTDOWN_JOIN_TIMEOUT_SECS timeout (mirroring shielded_sync_stop); on timeout the clear reports Timeout and aborts BEFORE the wipe, leaving the store intact. Split out clear_shielded_inner(drain_timeout) so the timeout path is testable without the 30s budget. SEC-002/RUST-002: the gate lapsed between quiesce() returning (its RAII guard lowers the shared flag) and the post-drain re-raise, letting a direct sync_now/sync_wallet slip past any_alive_for and re-persist into the wiped store. Fix: raise+HOLD the gate via clear's own guard BEFORE draining, and drain via a new gate-neutral quiesce_under_held_gate (extracted cancel_join_and_drain shared with quiesce, which stays byte-identical — Fix-1 invariant untouched). The gate now stays raised continuously across drain, liveness check, and wipe; doc softened to note the only residual is a full start() racing clear (per-key-latch follow-up). PROJ-004: clear now calls shielded_worker_alive() instead of re-inlining any_alive_for. Also clarifies the quiesce doc that only shielded gates sync_wallet (platform-address's is intentionally ungated). TDD: both SEC tests proven non-vacuous (revert->fail->restore). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0157yd3YvWeyckhfQivS9gf7
…story-tags RUST-004: a failed thread spawn left the slot carrying the FAILED start's weight/drain/join_budget (and a bumped generation). Now snapshot the pre-start config and restore ALL of it on the Err path, so the re-installed prior keeps its own teardown config; generation rolls back too (the +1 is only observed under the slot lock and the failed start spawns no thread, so the rollback is net-zero and the externally-visible generation stays monotonic). New regression test asserts the restored config. RUST-005: trimmed the duplicated park-under-lock rustdoc block in start_thread that repeated park_prior_locked's doc. RUST-003/PROJ hygiene: removed [F1/F2/F3 FIX] history-tags from committed comments, replaced the 'Why F1 and F2 cannot recur' module section with present-state invariant descriptions, and fixed the glossary to reference the key-scoped any_alive_for (the gate store-wiping paths actually consult) rather than the registry-wide any_alive. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0157yd3YvWeyckhfQivS9gf7
…an-trigger contract shielded_sync_stop now also returns .errorShutdownIncomplete when the drain was clean but a prior-generation shielded thread is still parked alive as an orphan (not only when the in-flight drain times out). Update the manager deinit comment to reflect both triggers; behaviour unchanged (the deinit already leaks one strong ref to the handlers on .errorShutdownIncomplete). Comment-only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0157yd3YvWeyckhfQivS9gf7
…fencing (SeqCst) SEC-003: the pass-gate handshake is a Dekker-style StoreLoad pair across two distinct atomics — quiesce does store(quiescing) … load(is_syncing); begin_pass does CAS(is_syncing) … load(quiescing). Release/Acquire do NOT order StoreLoad across separate locations, so by the annotations alone both sides could miss each other (begin_pass reads a stale quiescing==false and runs a pass past a raised gate while the drain reads a stale is_syncing==false and returns). It was sound only incidentally — registry.quiesce happens to take the slots Mutex (a fence) before returning; a lock-free refactor of that path would make the race live. Promote the four handshake ops to SeqCst (a single total order guarantees at least one side observes the other): the gate-raise stores in quiesce, hold_quiescing_gate, and the registry drain hook; the is_syncing CAS (success) and quiescing load in begin_pass; and the is_syncing load in the drain. Gate lowering (reopen / RAII drop) and observational reads stay Release/Acquire — a stale-high gate read only makes a pass bail conservatively. Fix-1's gate-before-cancel + never-latched invariant is preserved (SeqCst is strictly stronger than the prior Release). Added a load-bearing-ordering comment at begin_pass. Not unit-testable (ordering); the existing gate/drain handshake tests still pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0157yd3YvWeyckhfQivS9gf7
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
All three prior findings (911f99f) are resolved at 748c4f8: CoordinatorLifecycle::quiesce now self-raises the quiescing gate and drains in-flight direct passes (coordinator_lifecycle.rs:166-173, with a regression test); the shielded_sync_stop FFI now consults manager.shielded_worker_alive() and returns ErrorShutdownIncomplete when a prior-generation shielded orphan is parked (shielded_sync.rs:127-157); and start_thread now parks the prior worker under the slot lock via park_prior_locked before the out-of-lock reap, with a regression test in the registry. One in-scope cumulative concern remains: the shared quiescing AtomicBool means a concurrent quiesce()'s AtomicFlagGuard drop can lower the gate that clear_shielded's hold_quiescing_gate is supposed to keep raised continuously — the mechanism does not match the doc's continuous-hold promise in clear_shielded_inner. In practice this is gated by the host's UI-thread serialization, so it is a suggestion, not a blocker.
🟡 1 suggestion(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/rs-platform-wallet/src/manager/coordinator_lifecycle.rs`:
- [SUGGESTION] packages/rs-platform-wallet/src/manager/coordinator_lifecycle.rs:166-173: clear_shielded's "continuously held" gate can be lowered by a concurrent quiesce()
`clear_shielded_inner` holds `_clearing_gate = shielded_sync_manager.hold_quiescing_gate()` across drain → liveness check → wipe, expecting the shielded `quiescing` boolean to stay `true` for the whole sequence (and the doc at manager/mod.rs:517-520 explicitly promises "the quiescing gate is raised continuously" so concurrent direct `sync_now`/`sync_wallet` are held off). But the same `AtomicBool` backs `CoordinatorLifecycle::quiesce`, which builds its own `AtomicFlagGuard` (coordinator_lifecycle.rs:171); `AtomicFlagGuard::drop` unconditionally stores `false` (rs-dash-async/src/atomic.rs:25-29). If `platform_wallet_manager_shielded_sync_stop` is invoked concurrently with `clear_shielded` (the FFI handle store only takes a brief read lock in `with_item`, so two calls on the same handle can interleave), the stop's `quiesce()` returns and its dropping guard clears the flag while clear is still between its drain and its wipe. A direct `sync_now`/`sync_wallet` then sees `quiescing=false` in `begin_pass`, claims `is_syncing`, and can re-persist into the store `coord.clear()` is about to wipe — a store/host desync. The host's UI-thread serialization mitigates this today, but the documented invariant on `clear_shielded` only excludes concurrent `shielded_sync_start`, not `_stop`. Fix options: refcount the gate (so the inner drop doesn't lower it under the outer hold), use a clear-specific second flag that `begin_pass` also consults, or tighten the host-serialization doc to enumerate `_stop` too.
| } | ||
|
|
||
| impl SlotState { | ||
| fn dormant() -> Self { |
There was a problem hiding this comment.
why not just impl Default?
There was a problem hiding this comment.
Addressed in 0e73de7: fn dormant() replaced with a manual impl Default for SlotState. Manual (not derived) — #[derive(Default)] would zero join_budget, but the dormant slot must carry DEFAULT_JOIN_BUDGET = 30s. Both call sites switched to or_default().
(Reply-only — leaving for you to resolve.)
| if slot.cancel.is_some() { | ||
| return; | ||
| } | ||
| let prior = slot.handle.take(); |
There was a problem hiding this comment.
Same code in both cases. DRY.
There was a problem hiding this comment.
Addressed in 0e73de7: extracted the duplicated 8-line slot-rotation block into SlotState::prepare(cfg) -> (Option<WorkerHandle>, CancellationToken, u64). start_thread keeps its rollback-snapshot lines (captured BEFORE calling prepare) so a spawn failure can restore the prior config; start_task just calls prepare.
(Reply-only — leaving for you to resolve.)
| // R2: gate-before-cancel — fully await the drain hook before the | ||
| // cancel signal is observed. | ||
| if let Some(drain) = drain { | ||
| drain().await; |
There was a problem hiding this comment.
If we don't have a timeout here, at least measure time and tracing::debug! it if operation takes more than expected
There was a problem hiding this comment.
Addressed in 0e73de7: drain hook is now timed via Instant::elapsed() with DEBUG/WARN logging in quiesce. WARN fires above DRAIN_HOOK_WARN_THRESHOLD = 10s (1/3 of DEFAULT_JOIN_BUDGET). No hard timeout — the caller still bounds the whole teardown; this is observability for a hung drain.
(Reply-only — leaving for you to resolve.)
| let (drain, budget, my_gen) = { | ||
| let slots = self.lock_slots(); | ||
| match slots.get(&key) { | ||
| Some(s) if s.cancel.is_some() || s.handle.is_some() => { |
There was a problem hiding this comment.
why not use slot_alive() ?
There was a problem hiding this comment.
Addressed in 0e73de7: added an inline comment in quiesce explaining why s.cancel.is_some() || s.handle.is_some() is used rather than slot_alive() — a finished-but-unreaped handle still needs classifying into its terminal status; slot_alive() treats handle.is_finished() as not-alive and would short-circuit to NotRunning, dropping the result.
(Reply-only — leaving for you to resolve.)
| /// | ||
| /// Panic-strategy caveat (same as `AtomicFlagGuard`): the clear-on-panic | ||
| /// half relies on `Drop` running while the stack unwinds, so it holds under | ||
| /// `panic = "unwind"`. Under `panic = "abort"` a worker panic aborts the |
There was a problem hiding this comment.
can we detect panic mode and log a warning if it's wrong?
There was a problem hiding this comment.
Yes. I added a small follow-up on thepastaclaw:tracker-1583-panic-mode-warning that detects the build panic strategy with cfg(panic = "abort") and emits a one-shot tracing::warn! from ThreadRegistry::with_reap_backstop when the crate is built with abort semantics. It also updates the nearby EpilogueGuard docs and adds a focused idempotence/cfg regression test.
Commit: thepastaclaw@7f353680000c
Validation: cargo test -p dash-async --lib registry::, cargo check -p dash-async --tests, cargo clippy -p dash-async --tests -- -D warnings, cargo fmt --all --check.
There was a problem hiding this comment.
The panic-strategy caveat is the correct handling: stable Rust has no runtime API to query the active panic strategy — only #[cfg(panic = "...")] is available, and iOS release intentionally uses panic = "abort", so a hard compile_error! gate would be wrong for that target. The doc note stays as a known limitation. No code change.
(Reply-only — leaving open for you to resolve after review.)
…Report + registry ergonomics T5 (correctness): `shutdown()` discarded the `reap_orphans_impl` status, so a panicked/errored orphan that finished within the reap grace (`detached == 0`, status non-clean) silently passed `all_clean()`. Add a new `orphan_status: WorkerStatus` field to `ShutdownReport`, fold it through `all_clean()`, and capture it in `shutdown()`. Mirror the propagation on the wallet boundary: `CoordinatorExitStatus::from_report` now folds `orphan_status` through `detached_threads` when the survivor count is zero, so the FFI/Swift surface stays honest. Two registry regression tests (panicked + clean reaped orphan), two wallet boundary tests (propagation + non-over-trigger). T1 (ergonomics): replace `fn dormant()` with a manual `impl Default for SlotState`. `#[derive(Default)]` would zero `join_budget`, so the impl is hand-rolled to preserve `DEFAULT_JOIN_BUDGET = 30s`. Both call sites switch to `or_default()`. T2 (DRY): extract the duplicated 8-line slot rotation into `SlotState::prepare(cfg) -> (Option<WorkerHandle>, CancellationToken, u64)`. `start_thread` keeps its rollback-snapshot lines (captured before `prepare`) so a spawn failure can restore the prior config. T3 (observability): wrap `drain().await` in `quiesce()` with an `Instant`-based timing log — DEBUG by default, WARN when the drain exceeds `DRAIN_HOOK_WARN_THRESHOLD = 10s` (1/3 of `DEFAULT_JOIN_BUDGET`). No hard timeout — the caller still bounds teardown. T4 (invariant comment): document why `quiesce`'s slot-inhabited check uses the raw `cancel.is_some() || handle.is_some()` rather than `slot_alive()` — a finished-but-unreaped handle still needs classifying; `slot_alive()` would false-`NotRunning` it. Tests: 31/31 dash-async, 212/212 wallet default, 312/312 wallet --features shielded. cargo clippy --all-targets -- -D warnings clean. Addresses PR #3954 review threads T5, T1, T2, T3, T4. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BcZFae6ABdoMEvCBQZnaJF
…ed mid-flight T8/T9/T10: All three coordinator `stop()` docs claimed a pass already inside `sync_now` "keeps running to completion". The loop's `biased; cancel-first` select in `start()` actually drops the `sync_now` future at its next `.await` the instant cancel fires (see the existing comments at `start()`). Correct each docstring: - `identity_sync.rs`: in-flight pass cancelled mid-flight; completed `persister.store(...)` calls committed, in-flight store abandoned. - `platform_address_sync.rs`: in-flight pass cancelled mid-flight; `on_platform_address_sync_completed` host callback may not fire. - `shielded_sync.rs`: in-flight pass cancelled mid-flight; persister- callback fan-out for completed stores has fired, in-flight stores abandoned. The functional contract (use `quiesce()` for a real barrier) is unchanged; only the description of stop's effect on the in-flight pass is now accurate. Addresses PR #3954 review threads T8, T9, T10. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BcZFae6ABdoMEvCBQZnaJF
…nates concurrent-AtomicFlagGuard race T11 primary path: `platform_wallet_manager_shielded_sync_stop` ran `quiesce()` inline, which raises the `quiescing` gate via its own `AtomicFlagGuard`. A concurrent `shielded_clear` that holds the gate continuously via `hold_quiescing_gate` would then have stop's second guard's `Drop` lower the gate, opening a window for a new pass to slip past the "no new pass" barrier. Convert stop to cancel-only — matching `platform_address_sync_stop` / `identity_sync_stop`: signal the loop, return immediately. The join + orphan-liveness gate that prevents host UAF stays single-sourced on `destroy` (and `shielded_clear` for the clear flow). With no second guard ever created by stop, the primary T11 path is closed. Ripple updates so docs stay accurate: - `error.rs` `ErrorShutdownIncomplete` doc: removed `shielded_sync_stop` from the returners list — now only `destroy` + `shielded_clear`. - `accessors.rs` `shielded_worker_alive()` doc: same. - Swift `PlatformWalletManager.deinit`: both stops now `.discard()`; leak-on-incomplete logic keyed solely on `destroyCode`. - Swift `PlatformWalletManagerShieldedSync.stopShieldedSync()` comment reflects cancel-only semantics — Swift-side generation bump still drops trailing events delivered to `@MainActor`. DEFERRED (follow-up): - T11 RESIDUAL (5d569459 per-key clearing latch in ThreadRegistry, for a full `start()` racing `clear`) — substantial new mechanism, should be its own focused PR. - security-engineer adversarial verification of SEC-002 gate-continuity + F2 orphan-awareness intact under cancel-only — must run before merge. Tests: 108/108 FFI tests pass with shielded; 312/312 wallet --features shielded; cargo clippy --workspace --features shielded --all-targets -- -D warnings clean. Swift side not built (no toolchain on host) — 6b6a5a0d Swift `deinit` build-verify remains a separate macOS merge gate. Addresses PR #3954 review thread T11 (primary path). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BcZFae6ABdoMEvCBQZnaJF
…-adapter (T7) The wallet-event adapter's `tokio::select!` broke on `cancel.cancelled()` without draining the broadcast::Receiver's buffer. Events emitted just before stop (e.g. a `TransactionInstantLocked`, which P2P does NOT replay) were silently lost — the next sync subscribes fresh and only sees future events. Drain in the cancel branch via `try_recv()` before exiting. Bounded by `CANCEL_DRAIN_BUDGET = 4096` (well above the 256-cap broadcast capacity, so a normal teardown drains everything; pathological flood can't stall shutdown — respects the SEC-001 bounded-teardown invariant). Refactor: extract `process_event()` so the live loop and the cancel- drain cannot drift in behaviour; `drain_buffered_events()` handles `Lagged` with a warn and reports drained count for trace observability. Caveat: block-driven events (TransactionDetected, BlockProcessed, SyncHeightAdvanced) ARE re-derivable on the next SPV resync from `last_processed_height`, so the primary risk closed here is the non-replayed IS-lock case. The drain treats all kinds uniformly for simplicity. Tests: 312/312 wallet --features shielded pass — no regressions. A focused unit test for the cancel-drain path requires synthesizing upstream `WalletEvent`s and a persister mock for which no existing test infrastructure is set up; tracked as a follow-up TODO. Addresses PR #3954 review thread T7. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BcZFae6ABdoMEvCBQZnaJF
…ones, surface deferrals Phase 3 QA on the 4 PR #3954 review-thread commits surfaced 14 doc-rule findings (2 MEDIUM, 12 LOW; Marvin + Adams + Smythe). Smythe verified all 4 critical invariants (SEC-001/002, F1, F2) INTACT. Apply the trivial fixes here; defer the remaining substantial items. MEDIUM addressed: - DOC-001 + PROJ-002: drop "Rationale: before this change..." historical paragraph (and its `prior quiesce-here design` cousin in the inline) from the `shielded_sync_stop` FFI docstring. The full causal chain is in commit f94fed9's body; readers have `git blame` for context. The invariant ("a second `AtomicFlagGuard` here would race a continuously- held gate in `shielded_clear`") is restated in present tense. - DOC-002: strip ephemeral `T7:` review-thread ID from the core_bridge cancel-arm comment; trim 5 lines → 3. - PROJ-001: drop stale `(mirroring shielded_sync_stop's timeout)` from `clear_shielded_inner` — post-refactor `shielded_sync_stop` has no drain and no timeout; the cross-reference is now false. LOW addressed: - DOC-003: trim `SlotState::prepare` doc 7 → 4 lines (private fn). - DOC-004: trim `quiesce` R2 timing comment 5 → 3 lines. - DOC-006: trim `drain_buffered_events` doc 4 → 3 lines. - DOC-007: trim `from_report` doc + `detached_threads` inline. - DOC-008: trim Swift `stopShieldedSync` comment 7 → 4 lines. - PROJ-005: add `// TODO` marker above `drain_buffered_events` for the deferred unit test (no test scaffolding exists for the event adapter today). Project convention requires an in-source TODO. - PROJ-006: correct the Swift `shieldedSyncGeneration` docstring — `stop` has no Rust quiesce barrier post-refactor; it cancels the in-flight pass, and `clear` is the one with the full quiesce. Deferred (also surfaced as silent gaps by Adams PROJ-003/004, tracked separately): - 18d7acf4 — `CoordinatorThreadStatus`/`CoordinatorExitStatus` collapse onto `dash_async::WorkerStatus`/`ShutdownReport` (separate PR; T5 is fully fixed without it via the wallet-side `from_report` propagation). Tracked as a MemCan todo for follow-up. - 0d895258 — broader ThreadRegistry comment-hygiene sweep (separate audit/sweep PR). - 5d569459 — T11 RESIDUAL per-key clearing latch (separate PR; Smythe verified residual is acceptable MEDIUM — requires host-side programming error, data-integrity-only impact). Verified post-fix: 31/31 dash-async, 312/312 wallet --features shielded, 108/108 FFI shielded tests pass. cargo fmt --all --check + cargo clippy --workspace --features shielded --all-targets -- -D warnings clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BcZFae6ABdoMEvCBQZnaJF
…let-shutdown-join
…, PROJ-001/003/004 + Swift doc Grumpy-review fan-out (smythe / adams / bilby, opus, parallel) over the full PR delta (vs v3.1-dev, merge-base 1653b89). Smythe: 2 LOW. Bilby: 2 MED + 7 LOW. Adams: 1 MED + 5 LOW. No CRITICAL/HIGH. All four critical concurrency invariants (F1/F2/SEC-001/SEC-002) verified INTACT. Applied the trivial-to-medium findings here; deferrals tracked. MEDIUM applied: - SEC-001 (smythe, core_bridge.rs): cancel-drain budget was bypassable by a sustained broadcast `Lagged` (the arm logged but never advanced `drained`, so the loop could spin past the cap until the outer registry budget fired). Fix: bound on an `attempts` counter that ticks regardless of try_recv outcome, so Lagged consumes a budget slot like everything else. `drained` keeps its semantics (events actually persisted) and is reported on the budget-hit warn for observability. - RUST-001 (bilby, core_bridge.rs:423-439): `is_empty_no_records` was byte-for-byte identical to `Merge::is_empty` (changeset.rs:236) with a self-contradicting docstring ("skips" a field it actually checks). Two copies could drift silently and skip persisting a populated CoreChangeSet → quiet data loss. Delete the local copy, call `Merge::is_empty` via the trait import. - PROJ-001 (adams, mod.rs:271-282): `SHUTDOWN_JOIN_TIMEOUT_SECS` doc still listed the FFI shielded-stop bridge as a quiesce+join consumer — but commit f94fed9 made that bridge cancel-only. Drop the stale mention; note the cancel-only status inline. LOW applied: - RUST-004 (bilby): add `#[must_use]` to `ShutdownReport` and `CoordinatorExitStatus` — silently dropping the UAF-avoidance `all_clean()` signal now warns at compile time. tc010 test that intentionally discards the panic-result is annotated with `let _ =`. - RUST-005 (bilby): document `start_thread`'s synchronous reap-spin (`std::thread::sleep` up to `reap_backstop`) as async-unsafe; should be invoked via `spawn_blocking` or a dedicated thread. Footgun prevention for the rs-dapi-client adoption that's already in scope. - RUST-006 (bilby, core_bridge.rs:296-302): trim the `ChainLockProcessed` comment's "earlier signal had a gap" history narration; keep the present-state rationale + issue citation. - RUST-007 (bilby, ffi/manager.rs:383-385): drop the redundant `else` after the diverging if-return in the FFI destroy handler. - PROJ-003 (adams): add a focused test for `cancel_all()` — cancel three workers in one call, assert none are running, drain each cleanly. Was a public method with no in-repo caller. Test count: dash-async 31 → 32. - PROJ-004 (adams, PlatformWalletManager.swift:155-162): deinit comment said "signal both" then "all three discard" — count mismatch. Reword to explicitly name the two host-driven loops (platform-address + shielded) and note identity-sync + event adapter are joined inside destroy. - CALL-001 (adams, swift example app ShieldedService.swift:537-543): `clearLocalState` doc claimed `stopShieldedSync` provides the drain barrier; post-refactor that's `clearShielded()`. Doc-only; the impl was already correct. DEFERRED / DISMISSED (with rationale): - SEC-002 (smythe): T11 RESIDUAL — `shielded_sync_start` racing `clear_shielded` reopens the held quiescing gate. Documented as a host-discipline precondition + in-source TODO; the per-key clearing latch is queued as memcan todo 5d569459 (medium priority). Smythe's earlier adversarial pass classified the residual as an acceptable MEDIUM (host-side programming error to trigger; data-integrity, not UAF). - RUST-002 / PROJ-002: cancel-drain unit test for the IS-lock-loss path. No existing test scaffold for `wallet_event_adapter_loop`; the in-source TODO above `drain_buffered_events` already marks the work as deferred. Follow-up. - RUST-003 (dead per-coordinator `pub quiesce()`): retained — they are intentional public API for parity with the three coordinator surfaces, removing them mid-PR would surprise downstream consumers. - RUST-008 / CODE-001 (ephemeral IDs `R1`/`R2`/`R3`, `F1`/`F2`, `SEC-001`/`SEC-002`, `TC-XXX`, `GAP-XXX`): DISMISSED. These are project invariant labels documented in module-level docs and used as cross-reference anchors throughout the codebase (R1 = lock atomicity, R2 = gate-before-cancel, R3 = kind-dispatch; F1 = UAF safety; SEC/TC/GAP = security/test-case/gap invariant IDs). They are stable convention, not transient review-thread IDs (those were T1-T11 and never appeared in committed code; the one `T7:` leak Marvin caught earlier is fixed). Grep for `T[0-9]` review IDs and `PR#NNN` patterns returned only durable cross-repo issue citations. - RUST-009 (over-budget internal comment blocks): bilby's own note acknowledged the rationale is justified concurrency invariant prose; consolidating into module-level docs is broader cleanup tracked under the 0d895258 comment-hygiene sweep. Verified: cargo fmt --all --check + cargo clippy --workspace --features shielded --all-targets -- -D warnings clean. Tests: 32 dash-async (31 → 32, +cancel_all_signals_every_worker), 312 wallet --features shielded, 108 wallet-ffi shielded. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BcZFae6ABdoMEvCBQZnaJF
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Incremental review of 748c4f8..6ed5200. Latest delta adds the server-advertised rate-limit ban path in rs-dapi-client + dashmate gateway wiring, simplifies platform_wallet_manager_shielded_sync_stop to cancel-only, drains buffered wallet events on cancel in event-adapter (T7), and threads orphan_status through ShutdownReport. No new blocking defects found. The prior architectural finding around clear_shielded's continuously-held gate vs. CoordinatorLifecycle::quiesce on the same AtomicBool is STILL VALID but now latent: ShieldedSyncManager::quiesce remains pub on the same atomic, however no FFI export or in-tree Rust caller races it against clear_shielded (verified). Carried forward as suggestion only. Model coverage degraded: Codex (general/rust-quality/ffi-engineer) did NOT run this round due to ACP/native auth stall; review is Claude-only.
Model coverage note: Claude general, Claude rust-quality, and Claude ffi-engineer ran. Codex general/rust-quality/ffi-engineer did not run because ACP-Codex authentication stalled during the authenticate handshake and native Codex CLI returned 401/missing bearer; this review is therefore degraded from the usual Claude+Codex matrix.
🟡 1 suggestion(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/rs-platform-wallet/src/manager/coordinator_lifecycle.rs`:
- [SUGGESTION] packages/rs-platform-wallet/src/manager/coordinator_lifecycle.rs:166-249: Public ShieldedSyncManager::quiesce can still lower clear_shielded's continuously-held gate (carry-forward of prior 748c4f82-1)
`clear_shielded_inner` (manager/mod.rs:548) holds `_clearing_gate = shielded_sync_manager.hold_quiescing_gate()` across the entire drain → liveness-check → store-wipe sequence, expecting the shielded `quiescing` boolean to stay raised so no direct `sync_now`/`sync_wallet` slips past and re-persists into the store being wiped (mod.rs docs 515-524). The hold guard at coordinator_lifecycle.rs:243-249 and the RAII guard inside `CoordinatorLifecycle::quiesce` at line 171 both wrap the SAME `Arc<AtomicBool>`, and `AtomicFlagGuard::drop` unconditionally `store(false, Release)`s. A concurrent invocation of `ShieldedSyncManager::quiesce` (which is still `pub` at shielded_sync.rs:280) would lower the gate mid-Clear, opening exactly the window the docstring promises is closed.
Reachability on this head: verified there is no FFI export that triggers this — `platform_wallet_manager_shielded_sync_stop` is cancel-only in this delta, `destroy` goes through the registry shutdown path whose drain hook only stores `true` with no RAII guard (lines 121-131), and `shielded_clear` uses `quiesce_under_held_gate`. The only in-tree caller of `shielded_sync().quiesce()` is the internal test at coordinator_lifecycle.rs:346. So the cross-boundary exploit path is closed by surface; the residual is a fragile in-process Rust contract that future callers can break silently. The module docs at mod.rs:515-524 already acknowledge this as a follow-up (per-key clearing latch / refcount on the registry). Suggested cheap structural fixes: narrow `ShieldedSyncManager::quiesce` to `pub(crate)` (no FFI consumer today), refcount the gate, or force every caller through `quiesce_under_held_gate` after `hold_quiescing_gate`. Non-blocking — flagged so the latent fragility does not regrow once new shielded entry points are added.
…orLifecycle::spawn_periodic_loop (d2e1a335) The three periodic-sync coordinators (identity_sync, platform_address_sync, shielded_sync) had byte-identical `start()` loops — `reopen_quiescing_gate` → `worker_config` → `Handle::current` → `registry.start_thread` with the same `biased; cancel-first` select + interval sleep body, plus a near-identical multi-paragraph comment. Hoist the loop into `CoordinatorLifecycle::spawn_periodic_loop<F, Fut, I>` so each coordinator keeps only its domain-specific pass body. Each coordinator's `start()` now collapses to a 10-line thunk: clone two `Arc<Self>` (one for `pass`, one for `interval`), call `self.lifecycle.spawn_periodic_loop(...)`. The `pass` closure clones the captured Arc once per tick (cheap) and `.await`s the coordinator's `sync_now` — `Fn() -> Fut` semantics with the inner future free to be `!Send` since it runs under `Handle::block_on` on the OS-thread worker. Knock-on cleanup: `CoordinatorLifecycle::registry()` and `worker()` had no remaining callers after the hoist; removed both. Tests: 312/312 wallet --features shielded pass (no regressions). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BcZFae6ABdoMEvCBQZnaJF
…orExitStatus onto dash_async (18d7acf4)
Both wrapper types were new in this PR, so this is API churn within the
PR, not a released-API break. They duplicated dash_async equivalents:
- delete CoordinatorThreadStatus (enum + is_clean + From<WorkerStatus>);
callers use dash_async::WorkerStatus directly
- delete CoordinatorExitStatus (struct + all_clean + from_report);
PlatformWalletManager::shutdown now returns ShutdownReport<WalletWorker>
A never-started worker is now an absent per_worker key rather than a
materialized NotRunning slot; all_clean() is unchanged.
Touchpoints:
- rs-platform-wallet/src/manager/mod.rs: delete both types, shutdown
returns the report, clear_shielded_inner + tests use WorkerStatus
- rs-platform-wallet/src/manager/coordinator_lifecycle.rs: quiesce /
quiesce_under_held_gate / cancel_join_and_drain return WorkerStatus
- rs-platform-wallet/src/manager/{identity,platform_address,shielded}_sync.rs:
quiesce returns dash_async::WorkerStatus
- rs-platform-wallet/src/error.rs: ShieldedShutdownIncomplete.status: WorkerStatus
- rs-platform-wallet/src/lib.rs: re-export dash_async::{ShutdownReport, WorkerStatus}
- rs-platform-wallet-ffi/src/error.rs: test uses platform_wallet::WorkerStatus
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…atorExitStatus onto dash_async (18d7acf4)
`CoordinatorThreadStatus` and `CoordinatorExitStatus` (both NEW in this
PR, never released) were a thin, hand-maintained shadow of
`dash_async::WorkerStatus` and `ShutdownReport<K>` — exhaustive 1:1
`From` mapping for `WorkerStatus`, named-field projection for
`ShutdownReport` — and required a per-PR test of the variant table
that drifted on every status addition (T5's `orphan_status` field).
Delete the wrappers and use the dash-async types directly. The wallet
surface keeps the same end-user-visible verdict (`all_clean()`) and
gains the orphan-status fold-through for free (the registry already
owns it post-T5).
Touchpoints (every wallet/FFI/test reference):
- packages/rs-platform-wallet/src/manager/mod.rs — delete the two
types + their impls; `shutdown()` returns `ShutdownReport<WalletWorker>`
directly; `clear_shielded_inner`'s `Err(_elapsed)` arm and Detached
raise use `WorkerStatus::{Timeout, Detached}`; all 11+ unit-test
blocks rewritten to assert against `report.per_worker.get(&key)` /
`report.all_clean()` instead of the named-field projections.
- packages/rs-platform-wallet/src/manager/coordinator_lifecycle.rs —
`quiesce`, `quiesce_under_held_gate`, `cancel_join_and_drain` return
`WorkerStatus`; drop the now-redundant `.into()` at the `quiesce`
call site.
- packages/rs-platform-wallet/src/manager/{identity_sync,
platform_address_sync,shielded_sync}.rs — coordinator `quiesce()`
methods return `WorkerStatus`.
- packages/rs-platform-wallet/src/error.rs — `ShieldedShutdownIncomplete
{ status }` carries `dash_async::WorkerStatus`.
- packages/rs-platform-wallet/src/lib.rs — drop the wallet-type re-
exports; re-export `dash_async::{ShutdownReport, WorkerStatus}` so the
PR-introduced public surface stays single-import.
- packages/rs-platform-wallet-ffi/src/error.rs — `is_clean()` and Debug
on `WorkerStatus` work unchanged; one stale type reference updated.
- packages/rs-dash-async/src/registry.rs — `WorkerStatus` doc was the
source of the wallet-CoordinatorThreadStatus cross-reference; rewrite
it standalone (the wallet type it described no longer exists).
Semantic change worth flagging (preserved end-user behavior):
- `CoordinatorExitStatus` materialised never-started workers as
`NotRunning` (`shielded_sync: Some(NotRunning)` / `None` for non-
shielded builds). `ShutdownReport` encodes "never started" as
key-absent-from-`per_worker`. End-user verdict (`all_clean()`) is
identical; tests that compared `== NotRunning` now check
`!report.per_worker.contains_key(&WalletWorker::X)`.
Verified: cargo fmt --all --check + cargo clippy --workspace --features
shielded --all-targets -- -D warnings clean.
Tests: 32 dash-async (incl. 1 doctest), 312 wallet --features shielded,
108 wallet-ffi --features shielded — zero regressions.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BcZFae6ABdoMEvCBQZnaJF
…5d569459) After the 8fb49ae9 cancel-only `shielded_sync_stop` closed T11's primary path, a residual remained: a concurrent `shielded_sync_start()` racing `clear_shielded` calls `reopen_quiescing_gate` (unconditional store-false on the shared flag), lowering the gate `clear_shielded_inner` holds continuously. If the start lands between the clear's liveness check and the store wipe, the fresh sync loop re-persists into the just-wiped store — a data-integrity desync (not a UAF — the documented host serialization precondition has held in practice, but in-code defence is better than relying on host discipline). Close it with a per-key clearing latch on `ThreadRegistry`: - New `clearing: Mutex<BTreeSet<K>>` field on the registry — mirrors the existing `closing` latch pattern but per-key and resettable. - `start_thread`/`start_task` honour the latch under the same slot lock they honour `closing` under, so a `(re)start` for a key whose owner is mid-clear is refused (no slot installed, no worker spawned). - New `pub fn hold_clearing(self: &Arc<Self>, key: K) -> ClearingGuard<K>` returns an RAII guard; the guard's `Drop` removes the key from the set on every exit path, including panic unwinding — callers cannot leak the latch. - `Debug` impl picks up the new `clearing` field. Wired into `PlatformWalletManager::clear_shielded_inner`: the clear flow now holds BOTH the registry latch (against full `(re)start`s) AND the existing quiescing gate (against direct `sync_now`/`sync_wallet` passes that bypass the registry via `begin_pass`). Tests: - dash-async 32 → 35 (+`hold_clearing_blocks_starts_for_latched_key_only`, `hold_clearing_nested_guards_share_the_latch`, `hold_clearing_releases_on_panic_unwind`). - wallet `--features shielded` 312 → 313 (+`shielded_sync_start_no_ops_while_clearing_latch_held` — the wallet- level integration test that proves T11 residual is closed: holding the latch (mirror of `clear_shielded_inner`'s hold), a concurrent `shielded_sync_manager.start()` leaves `is_running()==false`; sibling coordinators (per-key latch) start cleanly; latch release restores startability). - wallet-ffi `--features shielded` 108 (unchanged). - cargo fmt + cargo clippy --workspace --features shielded --all-targets -- -D warnings clean. Closes the deferred 5d569459 follow-up flagged by Smythe's adversarial review of the T11 primary fix. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BcZFae6ABdoMEvCBQZnaJF
…eadRegistry diff (0d895258) Apply Cross-Cutting Rules (length cap ≤2-3 internal / ≤5-10 public API doc; present-state only; two-tier audience) to comments INTRODUCED or MODIFIED by this PR vs v3.1-dev. Load-bearing concurrency rationale preserved verbatim — the comments that actually carry teardown safety information (SeqCst handshake, slots→orphans lock-nesting, F1/F2 invariants, panic = "abort" caveats, must_use justifications, clearing- latch contract) keep their full prose because removing them would let a maintainer break the invariant without realising it. Files swept: - packages/rs-dash-async/src/registry.rs — trimmed several over-long internal blocks (most R1/R2/R3 narration condensed; the load-bearing bits kept). T-id review-thread leaks in private test fn names removed (t5_* → bare descriptive names; 3 fns in this file). - packages/rs-platform-wallet/src/manager/coordinator_lifecycle.rs — trimmed comments around spawn_periodic_loop and the gate-hold guards. - packages/rs-platform-wallet/src/manager/mod.rs — trimmed comments around clear_shielded_inner + the panicked-orphan tests; renamed the one remaining t5_-prefixed test fn. Verification post-sweep: - cargo fmt --all -- --check clean. - cargo clippy --workspace --features shielded --all-targets -- -D warnings clean. - dash-async 35 / wallet --features shielded 313 / wallet-ffi --features shielded 108 — no regressions. - grep across rs-dash-async + rs-platform-wallet[-ffi] for ephemeral T1-T11 review-thread IDs: zero hits (only project-invariant labels R1/R2/R3, F1/F2, SEC-001/SEC-002, TC-XXX, GAP-XXX remain — those are cross-referenced design anchors, not transient PR IDs). Closes the deferred 0d895258 follow-up. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BcZFae6ABdoMEvCBQZnaJF
…nsafe) for Swift 6 strict concurrency CI Swift build was failing with: error: cannot access property 'persistenceHandler' with a non-sendable type 'PlatformWalletPersistenceHandler?' from nonisolated deinit (and the same for 'eventHandler') Swift 6 strict concurrency forbids accessing non-`Sendable` properties from a `nonisolated deinit`, but our deinit must read both handler references to `passRetained` them on incomplete-shutdown — leaking the reference is the load-bearing UAF avoidance for a wedged FFI worker that may still call back through them. Opt the two properties out of the check with `nonisolated(unsafe)`. The access is safe: it only runs on the deinit path of the wrapper class itself, after the Rust side has already signalled the worker did not drain cleanly (a documented FFI lifetime contract — see the comments on the handlers). The handler types themselves remain non-`Sendable` across the rest of the codebase; this is a targeted escape hatch for the deinit-only access. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BcZFae6ABdoMEvCBQZnaJF
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Two blocking issues found at HEAD 284752a. (1) ShieldedSyncManager::start()'s spawn_periodic_loop calls reopen_quiescing_gate() before start_thread's clearing-latch check — a racing FFI shielded_sync_start during clear_shielded silently lowers the clear's continuously-held quiescing gate, partially defeating the new latch. (2) The Swift ShieldedService.clearLocalState catches clearShielded()'s ShieldedShutdownIncomplete error and continues to wipe SwiftData, violating the explicit PlatformWalletResult contract ("must NOT commit its own persistence wipe — the Rust store was left intact"). Prior carry-forward finding (public ShieldedSyncManager::quiesce can lower the held gate) is STILL VALID as a latent suggestion. Several lower-severity issues around the ClearingGuard's set-membership semantics and its test coverage are kept as suggestions/nitpicks.
🔴 2 blocking | 🟡 1 suggestion(s) | 💬 1 nitpick(s)
1 additional finding(s) omitted (not in diff).
1 carried-forward finding(s) already raised on this PR; not re-posting as new inline comments.
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/rs-platform-wallet/src/manager/coordinator_lifecycle.rs`:
- [BLOCKING] packages/rs-platform-wallet/src/manager/coordinator_lifecycle.rs:131-161: Refused shielded (re)start still lowers clear_shielded's continuously-held quiescing gate
`spawn_periodic_loop` calls `self.reopen_quiescing_gate()` at line 137 (which `store(false, Release)`s `quiescing`) before invoking `registry.start_thread(...)` at line 142. The new per-key clearing latch correctly refuses the start inside `start_thread` (registry.rs:414-416), but by then the gate has already been lowered. `clear_shielded_inner` (manager/mod.rs:390-415) holds the gate up via `hold_quiescing_gate()` precisely to keep direct `sync_now`/`sync_wallet` callers blocked across drain → liveness-check → wipe, and a single shared `AtomicFlagGuard` always clears the flag on drop — the clear path has no way to re-raise it. So an FFI `platform_wallet_manager_shielded_sync_start` (exported at rs-platform-wallet-ffi/src/shielded_sync.rs:57) landing during a clear will: (a) call `ShieldedSyncManager::start()` → `spawn_periodic_loop` → `reopen_quiescing_gate()` setting `quiescing=false`, (b) be refused by the clearing latch at the registry, (c) leave the gate down for the remainder of the clear. A direct `sync_now`/`sync_wallet` going through `begin_pass` then observes `quiescing=false` and slips past the barrier, re-persisting into the store the clear is about to wipe — the exact race the new latch was added to close. Fix by checking the clearing latch BEFORE reopening the gate (e.g. early-return from `spawn_periodic_loop` if `registry.is_clearing(worker)`), or ordering the gate reopen so it only fires when the registry actually accepts the start (e.g. inside the closure passed to `start_thread`, or only after a non-rejecting outcome). The same hazard exists structurally for the identity-sync and platform-address coordinators if they ever gain a clearing latch.
In `packages/swift-sdk/SwiftExampleApp/SwiftExampleApp/Core/Services/ShieldedService.swift`:
- [BLOCKING] packages/swift-sdk/SwiftExampleApp/SwiftExampleApp/Core/Services/ShieldedService.swift:608-628: Swift clear flow wipes SwiftData rows after Rust reports the shielded store was left intact
`platform_wallet_manager_shielded_clear` maps `PlatformWalletError::ShieldedShutdownIncomplete` to `ErrorShutdownIncomplete` and returns BEFORE touching the Rust shielded store (rs-platform-wallet-ffi/src/shielded_sync.rs:446-461). The Swift error contract states this explicitly: `PlatformWalletResult.swift:188-193` says on `shutdownIncomplete` "the host ... must NOT commit its own persistence wipe — the Rust store was left intact so it can be retried once the pass settles." But `ShieldedService.clearLocalState` catches the thrown error, logs it via `SDKLogger.error`, and falls through into `modelContext.delete(model: PersistentShieldedNote/OutgoingNote/SyncState/Activity)` at lines 624-627. If Rust refused the clear because a coordinator didn't drain in time, SwiftData rows are deleted while the Rust commitment tree remains populated — the exact cross-language desync the new FFI result code was added to prevent. The legacy `// Best-effort — failure logs but doesn't abort the wipe` comment at lines 606-607 predates this contract and is now wrong. The catch must distinguish `.shutdownIncomplete` from other errors and return early without wiping host rows.
In `packages/rs-dash-async/src/registry.rs`:
- [SUGGESTION] packages/rs-dash-async/src/registry.rs:596-613: ClearingGuard uses set-membership semantics — nested or overlapping guards release the latch on the first drop
`hold_clearing` inserts the key into a `BTreeSet<K>` and `ClearingGuard::drop` removes it unconditionally. With two live `ClearingGuard`s for the same key (nested call by future code, OR two concurrent FFI `platform_wallet_manager_shielded_clear` invocations on the same handle — `HandleStorage::with_item` only holds a read lock), the first guard to drop removes the set entry and the surviving guard's protection lapses silently. Today only `clear_shielded_inner` takes the latch and the test `hold_clearing_nested_guards_share_the_latch` (registry.rs:1857) documents the quirk via comment — but the public `pub fn hold_clearing` signature gives no hint to callers, and a concurrent host-thread clear is reachable through the FFI surface. Convert the backing store to `BTreeMap<K, NonZeroUsize>` so guards refcount/compose, or serialize `clear_shielded` per key at the manager boundary so re-entrant clears can't overlap.
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Latest delta (97e373b) only adds nonisolated(unsafe) to two Swift handler properties for Swift 6 strict-concurrency compliance — that change is narrow and sound. All five prior findings remain unaddressed at HEAD, including two blocking issues: (1) spawn_periodic_loop unconditionally lowers the shared quiescing gate before the per-key clearing latch can refuse the (re)start, opening a window where a direct sync_now/sync_wallet can persist into the store being cleared, and (2) Swift ShieldedService.clearLocalState wipes SwiftData rows even when Rust returns ErrorShutdownIncomplete — the comment now explicitly codifies 'best-effort, failure logs but doesn't abort the wipe', which is the opposite of the documented FFI contract.
🔴 2 blocking | 🟡 1 suggestion(s)
1 additional finding(s) omitted (not in diff).
2 carried-forward finding(s) already raised on this PR; not re-posting as new inline comments.
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/rs-platform-wallet/src/manager/coordinator_lifecycle.rs`:
- [BLOCKING] packages/rs-platform-wallet/src/manager/coordinator_lifecycle.rs:131-161: [prior-1, carried forward] `spawn_periodic_loop` lowers the shared quiescing gate before the per-key clearing latch can refuse the (re)start
`spawn_periodic_loop` calls `self.reopen_quiescing_gate()` at line 137 — an unconditional `quiescing.store(false, Release)` on the `Arc<AtomicBool>` — BEFORE handing off to `registry.start_thread(...)` at line 142, where the per-key `hold_clearing(ShieldedSync)` latch refuses the start. During `clear_shielded_inner`, the clear flow continuously holds both the registry clearing latch and the quiescing gate (via `hold_quiescing_gate`, which is a non-restoring RAII guard). Race: (T1) clear holds latch + gate=true; (T2) host calls `shielded_sync_start()` → `spawn_periodic_loop` → `reopen_quiescing_gate` writes `false` → `start_thread` is refused by the latch, but the gate is already down; (T3) a direct `sync_now`/`sync_wallet` enters `begin_pass`, observes `quiescing == false`, takes the `is_syncing` slot, and runs to completion — persisting into the shielded store inside the clear-then-wipe window. The `quiesce_under_held_gate` debug_assert at line 219-222 catches the inconsistency only in debug builds; release builds proceed silently. Fix: only reopen the gate once `start_thread` confirms acceptance — either consult `lock_clearing` in `spawn_periodic_loop` and skip the reopen on refusal, move the reopen inside the registry's slot critical section after the latch check, or use the registry's accept signal to gate the store.
- [SUGGESTION] packages/rs-platform-wallet/src/manager/coordinator_lifecycle.rs:196-203: [prior-3, carried forward] Public `quiesce()` drops an `AtomicFlagGuard` on the same shared atomic that `clear_shielded` holds raised
`CoordinatorLifecycle::quiesce` constructs `_quiescing_gate = AtomicFlagGuard::new(&self.quiescing)` over the same `Arc<AtomicBool>` that `clear_shielded_inner` holds raised via `hold_quiescing_gate`. `AtomicFlagGuard::Drop` unconditionally writes `false`, so any concurrent caller routed through the public `quiesce()` while a clear is in flight will lower the clear-flow's continuously-held gate. Today the wallet's `shutdown` routes through `registry.shutdown()` rather than this method, but `ShieldedSyncManager::quiesce` remains `pub` and reachable via the `shielded_sync_arc` accessor — the trap is one accidental call away from re-opening the same lapse Clear was hardened to prevent. The structural fix is to give the gate a refcount/holder-aware re-raise (so concurrent owners compose), or to gate the public `quiesce` against the clearing latch and assert no other holder is active.
In `packages/swift-sdk/SwiftExampleApp/SwiftExampleApp/Core/Services/ShieldedService.swift`:
- [BLOCKING] packages/swift-sdk/SwiftExampleApp/SwiftExampleApp/Core/Services/ShieldedService.swift:606-628: [prior-2, carried forward] `clearLocalState` wipes SwiftData rows even when `clearShielded()` throws `ErrorShutdownIncomplete` — violates the documented FFI contract
`PlatformWalletFFIResultCode::ErrorShutdownIncomplete` is the explicit signal that the Rust shielded store was left intact: the FFI doc in `packages/rs-platform-wallet-ffi/src/error.rs` states 'the manager is NOT torn down and the store was left intact (Clear aborted before touching it). The host may retry the clear, and must not commit its own persistence wipe — doing so would desync the host's rows from the still-populated shared tree.' The current implementation at lines 608-616 catches the thrown error, logs it via `SDKLogger.error`, then falls through to step 2 (lines 623-628), unconditionally deleting `PersistentShieldedNote`, `PersistentShieldedOutgoingNote`, `PersistentShieldedSyncState`, and `PersistentShieldedActivity`. The comment at line 606-607 even codifies this: 'Best-effort — failure logs but doesn't abort the wipe.' That is the exact split-brain the FFI contract was designed to prevent: when Rust correctly preserves notes (orphan-alive backstop, drain timeout, non-clean drain status), Swift discards its mirror, and the next `bindShielded` + sync repopulates host rows from the surviving Rust state — the user-visible 'Clear' becomes a no-op on the Rust side while host data is lost. Minimum fix: inspect the thrown error code and short-circuit the SwiftData wipe (surfacing the error to the user / scheduler) on `errorShutdownIncomplete`.
…g gate when start will be refused (thepastaclaw BLOCKING)
5d569459 added a per-key clearing latch on `ThreadRegistry` and wired
`clear_shielded_inner` to hold it across the whole clear. But
`CoordinatorLifecycle::spawn_periodic_loop` called
`reopen_quiescing_gate()` BEFORE delegating to `start_thread`, where the
latch is checked. So a concurrent `shielded_sync().start()` during a
clear had:
1. `clear_shielded_inner` holding `_clearing_gate` (quiescing=true)
and `_clearing_latch` (registry refuses ShieldedSync (re)start).
2. The start path: `reopen_quiescing_gate()` → quiescing.store(false).
3. Then `start_thread()`: latch refuses the start, returns no-op.
4. Gate left FALSE despite `_clearing_gate` being held — a direct
`sync_now`/`sync_wallet` via `begin_pass` would now observe a
down gate and slip past the clear's "no new pass" barrier,
re-persisting into the store the clear is about to wipe.
Fix:
- New `ThreadRegistry::is_clearing(key) -> bool` public method exposes
the latch state so a coordinator can observe it before any wallet-
side side-effects (the registry only protects calls that go through
it; the quiescing flag is wallet-side).
- `spawn_periodic_loop` now checks `self.registry.is_clearing(self.worker)`
and bails BEFORE `reopen_quiescing_gate` — the start is refused without
touching the gate, preserving `clear_shielded`'s continuous SEC-002
hold.
Regression test:
`shielded_start_during_clear_preserves_quiescing_gate` raises both
guards `clear_shielded_inner` raises (latch + gate), calls `start()`,
asserts the start is refused AND the gate stays raised. Pre-fix this
asserts false — non-vacuous. New test-only accessors
`CoordinatorLifecycle::quiescing_load_for_test` and
`ShieldedSyncManager::quiescing_load_for_test` expose the flag read for
the assertion.
Verified: 35 dash-async / 314 platform-wallet --features shielded (was
313, +1 for the regression) / 108 platform-wallet-ffi --features
shielded all pass. cargo fmt --check + cargo clippy --workspace
--features shielded --all-targets -- -D warnings clean.
Addresses thepastaclaw BLOCKING finding (3480216817 + carry-forward
3480287460) on PR #3954.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BcZFae6ABdoMEvCBQZnaJF
…own-join Single-merge approach (stack tip `origin/feat/platform-wallet-secret-protection`) folds all three open PRs in one shot: - #3692 feat(platform-wallet): watch-only rehydration from persistor (seedless load) - #3828 refactor(platform-wallet)!: retire in-band pool snapshot; hardcode core UTXO account_index=0 - #3953 feat(platform-wallet-storage)!: layered opt-in secret protection (Tier-2 password envelope) Incoming: 271 commits, 93 files, +9752 / -3042 vs merge-base e2039e5. Conflicts (3 content + 1 semantic-only) resolved by hand. Resolution policy: preserve our ThreadRegistry / CoordinatorLifecycle / shutdown invariants (F1, F2, SEC-001, SEC-002); accept their domain logic (rehydration paths, secret-protection storage, in-band pool retirement); port their additions onto our refactored types (`ShutdownReport<WalletWorker>`, `WorkerStatus`, the per-key clearing latch, `spawn_periodic_loop` hoist). - packages/rs-platform-wallet/src/changeset/core_bridge.rs: Kept OUR `CANCEL_DRAIN_BUDGET` and the cancel-drain helper plus its doc. Kept THEIR `warn_if_non_default_account` (account_index=0 domain guard called 2×) and THEIR `CoreChangeSet` projection tests. Dropped THEIR re-added `is_empty_no_records` impl — our prior PR refactored the call site to `Merge::is_empty()`, so re-adding the helper would be dead code (clippy fail). - packages/rs-platform-wallet/src/manager/platform_address_sync.rs and manager/shielded_sync.rs: Took OUR `CoordinatorLifecycle` refactor (lifecycle field + `spawn_periodic_loop`) over THEIR pre-refactor `background_cancel` / `background_generation` atomics. Their generation-guard regression test auto-merged and now validates OUR per-key clearing latch — ported its doc prose off the deleted `background_cancel` field name. - Semantic-only (no marker): the auto-merged shielded_sync.rs `make_manager()` test helper called the pre-refactor 2-arg `ShieldedSyncManager::new(event_manager, coordinator_slot)`. Our refactor added a 3rd `registry` param; ported it forward (`ThreadRegistry::new()` added as arg #3, mirroring the sibling address-sync test helper). - packages/rs-platform-wallet-ffi/src/core_wallet_types.rs: Ported one dangling `is_empty_no_records` comment ref → `is_empty`. No conflict markers remain anywhere; no dangling refs to `CoordinatorThreadStatus`, `CoordinatorExitStatus`, `background_cancel`, or `background_generation` in Rust source. Verified post-merge: - cargo fmt --all --check clean. - cargo clippy --workspace --features shielded --all-targets -- -D warnings clean (subsumes `cargo check` for the same feature surface — clippy is a strict superset). - dash-async lib: 35 passed (unchanged). - platform-wallet lib default: 217 passed (was 212; +5 from rehydration test additions). - platform-wallet lib --features shielded: 319 passed (was 313 pre-merge; +6 from the stack's test additions). - platform-wallet-ffi --features shielded: 108 passed (unchanged). - Zero test regressions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BcZFae6ABdoMEvCBQZnaJF
…let-shutdown-join Pulls in the integration merge prepared in worktree branch `merge/pr-stack-3692-3828-3953` (commit 9b3d6f8 — see that commit's body for the full conflict-resolution table and verification). Sequencing: this merge lands AFTER 432309b (the thepastaclaw BLOCKING gate-lowering fix on spawn_periodic_loop), so the stack's post-merge state has BOTH the per-key clearing latch invariants from the gate-fix AND the rehydration/secret-protection domain code.
…lic quiesce against in-flight clear (thepastaclaw 3480216821/3480287466) Two remaining thepastaclaw findings on the 5d569459 per-key clearing latch addressed together. ## (1) ClearingGuard set-membership → refcount Backing store changes from `Mutex<BTreeSet<K>>` to `Mutex<BTreeMap<K, NonZeroUsize>>`. `hold_clearing` now increments a per-key holder count (or inserts `NonZeroUsize::new(1)`); the guard's `Drop` decrements and removes the entry only when the count reaches zero. Why: nested or concurrent holders for the same key previously released the latch on the FIRST drop (set semantics — `insert` of an existing key was a no-op, `remove` unconditionally removed). Two concurrent `shielded_clear` invocations on the same handle (reachable through `HandleStorage::with_item`'s read lock) would silently lapse one clear's protection when the other's guard dropped. Refcount composes the holders so the latch stays raised until the LAST guard drops. The previous test `hold_clearing_nested_guards_share_the_latch` documented the lapse via comment but never asserted the lapse — thepastaclaw correctly flagged that it would pass under any drop-only implementation. Renamed to `hold_clearing_inner_drop_does_not_lapse_outer_protection` and rewritten to assert non-vacuously: with both guards alive a start is refused; after `drop(inner)` while `outer` is still alive, `is_clearing` MUST still return true AND a fresh `start_clean` MUST still be refused; only after `drop(outer)` does the latch fully release and a start succeed. Pre-refactor this test fails on both the `is_clearing` assertion and the post-inner-drop `is_running` assertion. `Drop` guards against an underflow (debug_assert) if the entry is missing — defensive against out-of-band removal that no current call site does, but cheap to encode. ## (2) Public `quiesce()` gated against the clearing latch `CoordinatorLifecycle::quiesce` constructs a local `AtomicFlagGuard` on the same `quiescing` atomic that `clear_shielded_inner` holds raised continuously via `hold_quiescing_gate`. The guard's `Drop` would unconditionally lower the gate — lapsing the clear's "no new pass" barrier and letting a direct `sync_now`/`sync_wallet` slip past `begin_pass` to re-persist into the wiping store. Mirror the `spawn_periodic_loop` fix from 432309b: bail at the start of `quiesce` if `registry.is_clearing(self.worker)` returns true. Return `WorkerStatus::NotRunning` — the in-flight clear is already cancel-joining the worker under its own gate, so there is no new work for `quiesce` to do. Today only `ShieldedSyncManager::quiesce` is reachable via `shielded_sync_arc`, but the carry-forward concern is that the trap is one accidental host call away. Regression test `shielded_quiesce_during_clear_preserves_quiescing_gate` raises both guards `clear_shielded_inner` raises (latch + gate), calls `shielded_sync_manager.quiesce()`, asserts the call returns `NotRunning` AND the gate stays raised. Pre-fix this fails on the gate assertion. Verified post-fix: - cargo fmt --all --check clean. - cargo clippy --workspace --features shielded --all-targets -- -D warnings clean. - dash-async lib: 35 passed (renamed test stays in place). - platform-wallet --features shielded: 321 passed (was 320; +1 quiesce regression). - platform-wallet --lib default: 217 passed (unchanged). Closes the deferred thepastaclaw 3480216821 (refcount suggestion), 3480216826 (test-doesn't-exercise nitpick — the renamed test now does), and 3480287466 (prior-3 carry-forward, public quiesce vs. held gate). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BcZFae6ABdoMEvCBQZnaJF
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Carried forward from prior review (still valid at bcf9388): prior-2 (blocking — Swift clearLocalState still wipes SwiftData on ErrorShutdownIncomplete), prior-3 (suggestion — public quiesce() drops AtomicFlagGuard on the shared atomic), prior-4 (suggestion — ClearingGuard uses set membership), prior-5 (nitpick — nested-guards test doesn't exercise the documented lapse). prior-1 is partially addressed: the new is_clearing short-circuit + SEC-002 regression test close the common race, but the check and the gate-write are still two distinct critical sections, leaving a narrow residual TOCTOU window — carried forward as a suggestion. No new defects identified in the latest delta (rs-platform-wallet-storage / secrets / keyless rehydration look careful: fail-closed topology check, transactional rollback, per-row skip taxonomy, structured FFI outcome).
🔴 1 blocking | 🟡 3 suggestion(s) | 💬 1 nitpick(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/swift-sdk/SwiftExampleApp/SwiftExampleApp/Core/Services/ShieldedService.swift`:
- [BLOCKING] packages/swift-sdk/SwiftExampleApp/SwiftExampleApp/Core/Services/ShieldedService.swift:608-616: [prior-2, carried forward] `clearLocalState` wipes SwiftData rows even when `clearShielded()` throws `ErrorShutdownIncomplete` — violates the documented FFI contract
The Rust FFI contract for `ErrorShutdownIncomplete` on `platform_wallet_manager_shielded_clear` (rs-platform-wallet-ffi/src/error.rs around the `ShieldedShutdownIncomplete → ErrorShutdownIncomplete` mapping, plus shielded_sync.rs:454) is explicit: when the manager surfaces this code, the Rust shielded store was left intact because Clear aborted before touching it (drain non-clean / orphan-alive backstop). The host must retry the clear and must NOT commit its own persistence wipe; doing so desyncs host rows from the still-populated shared tree. ShieldedService.swift:608-616 catches the throw, logs via `SDKLogger.error`, then falls through unconditionally into the SwiftData delete block at 623-628 (`PersistentShieldedNote`, `PersistentShieldedOutgoingNote`, `PersistentShieldedSyncState`, `PersistentShieldedActivity`). The comment at 606-607 ("Best-effort — failure logs but doesn't abort the wipe") codifies the exact split-brain the FFI contract was designed to prevent. The Swift surface already distinguishes this code: `PlatformWalletResultCode.errorShutdownIncomplete` exists (PlatformWalletResult.swift:47) and decodes into `PlatformWalletError.shutdownIncomplete(detail)` (line 240), so the host can branch on it. Minimum fix: on the typed `shutdownIncomplete` case (or any thrown error from `clearShielded()`, since the Rust contract is fail-closed there), short-circuit the SwiftData wipe and surface the error to the user instead of proceeding.
In `packages/rs-platform-wallet/src/manager/coordinator_lifecycle.rs`:
- [SUGGESTION] packages/rs-platform-wallet/src/manager/coordinator_lifecycle.rs:137-148: [prior-1, partial fix landed] Residual TOCTOU between `is_clearing` check and `reopen_quiescing_gate`
The new `is_clearing(self.worker)` short-circuit at line 145 closes the common race (clear is already holding the latch when spawn arrives), and the SEC-002 regression test in mod.rs:1071-1108 covers that ordering. But the check (line 145) and the gate-lower (line 148) are not done under a single critical section: `is_clearing` takes `lock_clearing`, returns the boolean, releases the lock; then `reopen_quiescing_gate` writes `false` to the shared atomic. If `clear_shielded_inner` acquires the latch AND raises the quiescing gate between those two points, the reopen at 148 lowers a gate Clear believes it holds continuously across drain → liveness → wipe. In that window a direct `sync_now`/`sync_wallet` going through `begin_pass` (lines 306-341) observes `quiescing == false`, claims the `is_syncing` slot, runs the pass to completion, and persists into the store about to be wiped. `quiesce_under_held_gate`'s `debug_assert!` (line 230) catches the inconsistency only in debug builds; release silently proceeds with the drain. The window is microseconds-narrow, but the design invariant is continuous gate-hold. Structural fix: have the registry expose an atomic observe-and-act under one `lock_clearing()` scope (e.g. `with_clearing_check`) so the gate write is sequenced with the latch read, or move the gate write inside the registry slot critical section alongside the latch check.
- [SUGGESTION] packages/rs-platform-wallet/src/manager/coordinator_lifecycle.rs:207-214: [prior-3, carried forward] Public `quiesce()` drops an `AtomicFlagGuard` on the same shared atomic that `clear_shielded` holds raised
`CoordinatorLifecycle::quiesce` constructs `_quiescing_gate = AtomicFlagGuard::new(&self.quiescing)` over the same `Arc<AtomicBool>` that `clear_shielded_inner` holds raised via `hold_quiescing_gate`. `AtomicFlagGuard::Drop` unconditionally writes `false`, so any concurrent caller routed through the public `quiesce()` while a clear is in flight will lower the clear-flow's continuously-held gate. `ShieldedSyncManager::quiesce` remains `pub` and reachable through the manager's shielded accessor — one accidental call away from re-opening the same lapse Clear was hardened to prevent. Structural fix: refcount/owner-aware re-raise (so concurrent owners compose), or gate the public `quiesce` against the clearing latch and assert no other holder is active before its guard drop is allowed to fire.
In `packages/rs-dash-async/src/registry.rs`:
- [SUGGESTION] packages/rs-dash-async/src/registry.rs:596-613: [prior-4, carried forward] `ClearingGuard` uses set-membership — overlapping guards release the latch on the first drop
`hold_clearing` does `lock_clearing().insert(key)` into a `BTreeSet<K>`, and `ClearingGuard::Drop` (lines 1099-1103) unconditionally removes the key. With two live guards for the same key (a nested or overlapping clear from a future caller, a test, or an accidental shielded re-clear before the first guard drops), the first drop removes the key while the second guard is still alive — silently lapsing the latch protection the still-live guard is supposed to provide. The new `is_clearing()` check (line 620) inherits the same accounting weakness because it reads the same set membership. The test at lines 1859-1883 acknowledges this in its comment but does not fix the type-level mismatch between the RAII guard's compositional appearance and its set-based semantics. Minimum fix: refcount per key (`BTreeMap<K, NonZeroUsize>`), or add a `debug_assert!` that the insert was novel so accidental nesting fails loudly in debug.
Note: GitHub refused to serve this PR's diff via gh pr diff because it exceeds the 20,000-line limit, so this exact-SHA review is posted body-only. Existing inline threads are reused rather than duplicated.
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Latest delta resolves prior-4 and prior-5: ClearingGuard now refcounts via BTreeMap<K, NonZeroUsize> so nested/concurrent guards compose, and the new test hold_clearing_inner_drop_does_not_lapse_outer_protection is non-vacuous and asserts the lapse window. prior-2 (Swift ShieldedService wipe-on-error) remains unchanged and blocking. prior-3 is partially addressed by the new is_clearing bail in public quiesce(), but the same check-then-act TOCTOU pattern from prior-1 now exists at both spawn_periodic_loop (lines 145-148) and quiesce (lines 217-224); collapsed into a single structural finding.
🔴 1 blocking | 🟡 1 suggestion(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/swift-sdk/SwiftExampleApp/SwiftExampleApp/Core/Services/ShieldedService.swift`:
- [BLOCKING] packages/swift-sdk/SwiftExampleApp/SwiftExampleApp/Core/Services/ShieldedService.swift:608-616: [prior-2, carried forward] clearLocalState wipes SwiftData rows even when clearShielded() throws — violates the FFI contract
Unchanged at 6b655aab. The Rust FFI contract for `platform_wallet_manager_shielded_clear` surfaces `ErrorShutdownIncomplete` specifically when the Rust shielded store was left intact (Clear aborted before touching it via the drain/orphan-alive backstop). The host MUST NOT commit its own persistence wipe in that case, or SwiftData rows desync from the still-populated shared tree — the exact split-brain this stack is designed to prevent.
Lines 608-616 catch the throw, log via `SDKLogger.error`, then fall through unconditionally into the SwiftData delete block at 623-628 (`PersistentShieldedNote`, `PersistentShieldedOutgoingNote`, `PersistentShieldedSyncState`, `PersistentShieldedActivity`). The comment at 606-607 ("Best-effort — failure logs but doesn't abort the wipe") codifies the wrong invariant. The Swift surface already distinguishes this code path: `PlatformWalletResultCode.errorShutdownIncomplete` decodes into `PlatformWalletError.shutdownIncomplete(detail)`, so the host can branch on it. Minimum fix: on any thrown error from `clearShielded()` (since the Rust contract is fail-closed there), short-circuit the SwiftData wipe and surface the error to the user.
In `packages/rs-platform-wallet/src/manager/coordinator_lifecycle.rs`:
- [SUGGESTION] packages/rs-platform-wallet/src/manager/coordinator_lifecycle.rs:145-224: [prior-1 + residual prior-3] Check-then-act TOCTOU around the shared `quiescing` atomic at both `spawn_periodic_loop` and `quiesce` call sites
Same structural pattern at two call sites — neither performs the `is_clearing` check atomically with the subsequent write to the shared `quiescing` atomic.
1. `spawn_periodic_loop` (145-148): `is_clearing(self.worker)` at 145 takes `lock_clearing`, returns the boolean, releases the lock; `reopen_quiescing_gate()` at 148 then writes `false` to the shared atomic. If `clear_shielded_inner` acquires the clearing latch AND raises the gate between those two operations, this `reopen_quiescing_gate()` lowers a gate Clear believes it holds continuously across drain → liveness → wipe — opening a window for a direct `sync_now`/`sync_wallet` to pass `begin_pass` and persist into the store about to be wiped (even though `start_thread` itself will subsequently be refused by the latch).
2. `quiesce` (217-224): the new fix to bail when `is_clearing` is true closes the dominant case (clear already holds the latch), but introduces the same race shape — `is_clearing` at 217 → unconditional `self.quiescing.store(true, ...)` + `AtomicFlagGuard::new(&self.quiescing)` at 223-224, whose `Drop` (`atomic.rs:25-29`) unconditionally writes `false`. If `clear_shielded_inner` acquires the latch + `hold_quiescing_gate` in the sub-microsecond window between 217 and 223, the racing quiesce installs an `AtomicFlagGuard` over the same atomic the in-flight clear is holding raised; on the quiesce's return the guard drops and lowers the gate, lapsing Clear's continuously-held barrier. The new regression test `shielded_quiesce_during_clear_preserves_quiescing_gate` only exercises the ordering where the clearing latch is acquired BEFORE the quiesce arrives.
Structural fix (resolves both call sites): expose an atomic observe-and-act on the registry under one `lock_clearing()` scope (e.g. a `with_clearing_check(F)` callback that runs F only while the clearing-set lock is held), or move the gate write inside the registry slot's critical section alongside the latch check. Alternatively, refcount the `quiescing` atomic so concurrent owners compose like `ClearingGuard` now does post-prior-4.
Note: Posted body-only because normal inline poster failed: command failed 1: python3 scripts/review_poster.py dashpay/platform 3954 6b655aa --dry-run.
Why this PR exists
Problem. The SPV wallet's three sync coordinators (identity / platform-address / shielded) and the event-adapter each ran their own ad-hoc worker lifecycle. The coordinators must run on dedicated OS threads (
std::thread+Handle::block_on) because the dash-sdk sync futures are!Send, so a tokio task group cannot host them. The teardown logic was triplicated across the three coordinators plus two parallel bookkeeping structures, and two teardown paths could leave a live coordinator thread detached while the host freed its context.What breaks without it (concrete):
shutdown()/quiesce()joined a coordinator by taking itsJoinHandleby value into a future wrapped in a 30 stokio::time::timeout. If the timeout fired mid-poll, the future dropped and the handle dropped with it — silently detaching a still-live OS thread without parking it. The nextdestroysawbackground_join == None→NotRunning→ reported clean, so the host freed the callback context while the detached thread could still fire into it.clear_shielded()wiped the shielded store gated only on the current coordinator quiesce being clean. A prior-generation shielded thread parked after a stop→start race still heldArcrefs to the persister being wiped, so the clear could run under a live writer.Blocking relationship. Builds on the in-branch shutdown-join work. Introduces a generic engine in the shared
dash-asynccrate thatrs-dapicould adopt later (nors-dapichange here).What was done?
ThreadRegistry<K>indash-async— one generic engine managing both!SendOS-thread workers andSendtokio tasks behind a uniform API: a generation-match epilogue, reap-or-park orphan tracking, weight-ordered gracefulshutdown(), and per-worker async drain hooks. Theis_syncing/quiescingdomain barrier stays in each coordinator and is handed to the registry as aDrainHook— the registry owns the lifecycle machinery, not the domain semantics.background_*fields, theCoordinatorOrphanstype); extracted aCoordinatorLifecyclehelper to dedup the remaining sharedis_syncing/quiescingprotocol — and after the latest review pass, the triplicated coordinatorstart()poll-loop body itself was hoisted intoCoordinatorLifecycle::spawn_periodic_loop(each coordinator now provides only its domain-specific pass thunk).JoinHandleowned by the registry slot (never moved into a cancellable future) and deterministically re-park on timeout; F2 fixed — a single key-scopedany_alive_for()gate over live + parked workers blocks the store wipe.quiesce, graceful thread-spawn-failure re-park with full slot-config rollback, panic-safe epilogue, a terminalclosinglatch, a restarted worker's prior parked under the slot lock,clear_shielded's drain bounded by a timeout (no host ANR) and itsquiescinggate held continuously across drain → liveness-check → wipe, and thequiescing↔is_syncinghandshake made self-fencing (SeqCst).shielded_sync_stopFFI — the bridge no longer raises its ownAtomicFlagGuardonquiescing, eliminating a race where a concurrent stop's guardDropcould lower the gateclear_shieldedholds continuously. The single join point for shielded teardown is nowdestroy(orshielded_clearfor the clear flow);shielded_sync_stopmatches the cancel-only convention ofidentity_sync_stopandplatform_address_sync_stop.ThreadRegistry::hold_clearing(key) -> ClearingGuardRAII latch refusesstart_thread/start_taskfor any key being cleared; wired intoclear_shielded_inneralongside the quiescing gate. Closes the residual where a fullshielded_sync_start()racingclear_shieldedcould otherwise reopen the gate and re-persist into the just-wiped store.wallet_event_adapter_loopnow drains the broadcast::Receiver via boundedtry_recv()on cancel before exiting, so an event emitted just before stop (notablyTransactionInstantLocked, which P2P does not replay) is still persisted. Bound (CANCEL_DRAIN_BUDGET = 4096) counts every attempt — includingLagged— toward the cap so a flood cannot stall teardown.CoordinatorThreadStatus/CoordinatorExitStatuswrappers were collapsed ontodash_async::WorkerStatus/ShutdownReport<WalletWorker>directly (both were introduced in this PR, never released — no released-API break).shutdown()now returnsShutdownReport<WalletWorker>; consumers re-export viaplatform_wallet::{ShutdownReport, WorkerStatus}. A never-started worker is encoded as key-absent-from-per_workerrather than a materializedNotRunningslot; theall_clean()verdict is unchanged.#[must_use]onShutdownReportandCoordinatorExitStatusso silently dropping the UAF-avoidance status now warns at compile time.How Has This Been Tested?
platform-walletlib: 212 (default) / 313 (--features shielded) passing — at/above baseline, no coverage regression. Net +3 over the baseline includes new T5/T11/cancel_all regressions.dash-async: 35 unit tests (+1 doctest), including the F1/F2 / cancel-drain bound / per-key clearing latch (3 latch tests) regressions. The F1/F2 + lifecycle fixes are proven non-vacuous (reverting each fix makes its regression test fail, then restored).platform-wallet-ffi: 108 (--features shielded) passing.cargo clippy --workspace --features shielded --all-targets -- -D warningsclean;cargo fmt --all --checkclean.(re)startfor the shielded key is no-op'd by the registry for the whole clear flow.Breaking Changes
PlatformWalletManager::shutdown()returnsdash_async::ShutdownReport<WalletWorker>(was()); FFIdestroymaps!all_clean()→ErrorShutdownIncomplete— hence the!in the title. A never-started worker is encoded as absent-from-per_workerrather thanNotRunningin a materialized field;all_clean()is unchanged.CoordinatorThreadStatus/CoordinatorExitStatus(both introduced in this PR, never released) were collapsed onto thedash_asyncequivalents; consumers re-export viaplatform_wallet::{ShutdownReport, WorkerStatus}.dash-async(ThreadRegistry,WorkerStatus,WorkerConfig,ShutdownWeight,ClearingGuard,ShutdownReport, …) and a new dev-onlydash-asynctest-utilfeature.shielded_sync_stopis now cancel-only (does not returnErrorShutdownIncomplete); the join + orphan-liveness contract for hosts moves todestroyandshielded_clear. SwiftPlatformWalletManager.deinitupdated accordingly.Checklist:
🤖 Co-authored by Claudius the Magnificent AI Agent