fix(replication): close on inbound message-handler errors instead of swallowing them (#440)#511
fix(replication): close on inbound message-handler errors instead of swallowing them (#440)#511kriszyp wants to merge 3 commits into
Conversation
…swallowing them (#440) An error escaping onWSMessage was logged and swallowed: the rest of the failed frame was silently dropped while later frames kept applying and confirming higher sequence ids — a permanent, undetected gap on the receiver (epic #430 Theme B / workstream #440 land-first safety item). Now the catch latches inbound processing off (queued frames on the messageProcessing chain must not commit past the hole) and closes with 1011 as a transient protocol close, so the normal retry path reconnects and resumes from the last durable cursor, re-streaming everything past it. Extracted as closeOnInboundMessageError with unit coverage, following the file's pure-decision-helper idiom. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…rame-vs-record error boundary (#440) Cross-model review findings on the previous commit: an unknown tableId made the inner decode-catch's own log line throw (unguarded tableDecoder.decoder deref), masking the original error and escaping to the outer close path by accident. Guard the derefs so the intended skip-and-log semantics and truthful diagnostics are restored. Also scope the closeOnInboundMessageError docstring: per-record value-decode failures are still skip-and-logged with the cursor advancing past them (residual tracked in #440 / #432). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces robust error handling for replication connections over WebSockets by implementing closeOnInboundMessageError. This function ensures that inbound processing is halted and the connection is closed with a transient error code (1011) when an unhandled error occurs, preventing silent data loss. Additionally, it guards against TypeErrors when logging decoding errors by using optional chaining on tableDecoder. The feedback suggests further strengthening this error handling by making the logger dependency optional to prevent secondary crashes, and improving the log output formatting when tableDecoder is undefined to avoid string-concatenating undefined values.
|
Reviewed; no blockers found. |
…review) Gemini findings: guard the logger access fully (the log must never prevent the close) and make the decode-error log readable when the table decoder is unknown. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Summary
Lands the first safety item of workstream #440 (epic #430 Theme B/F): an error escaping
onWSMessage— the inbound handler inreplicateOverWS— was logged and swallowed, silently dropping the rest of the failed frame while later frames kept applying and confirming higher sequence ids: a permanent, undetected gap on the receiver. The catch now callscloseOnInboundMessageError, which latches inbound processing off before closing (frames already queued on themessageProcessingchain must not commit past the hole; the peer can still deliver frames until the close handshake completes) and closes with 1011 as a transient protocol close, so the normal retry path reconnects and resumes from the last durable cursor. A deterministic poison frame becomes a visible, backed-off reconnect loop instead of silent loss (escalation budget = W2 #432).Follow-up commit from cross-model review: the inner per-record decode catch's log line dereferenced
tableDecoder.decoderunguarded, so an unknowntableIdthrew a secondary TypeError from the logger itself — masking the original error and escaping to the outer catch by accident. Now guarded, restoring the intended skip-and-log semantics with truthful diagnostics.Where to look
isSkippableLeadingDuplicatestore read,waitForDrain, frame decode) — routine/transient cases either have inner catches or only reject on substantive failure/teardown (where the latch is already set and close is a no-op). If you know of an error class in this path that is routine under normal operation, that's the thing to veto.decodeBlobsWithWritescatch, andmaxBatchVersionstill advances past the skipped record — the same silent-gap class, now explicitly tracked in Replication W11: Code organization, protocol versioning & decode-loop safety #440 with W2 Replication W2: Cursor correctness & divergence detection #432 (it's the #1163/#1453 structure-fork lineage, so closing there has real blast radius and deserves its own change).closeOnInboundMessageErroris extracted + unit-tested per the file's pure-decision-helper idiom (latch-before-close ordering pinned by test).Cross-model review (Codex + Gemini + domain adjudication) found no blockers; both significant findings are addressed or explicitly deferred as above. Not user-facing; no docs change needed.
Generated by an LLM (Claude Fable 5), reviewed per the Harper DLC.