feat: incremental db push (baseline + row-delta) — #17#7
Conversation
Closes the round-3 large-DB grab-bag: - db push --verify: poll the site URL after import until it answers HTTP (large imports can briefly return 000 right after import/flush); reported as `verified` in the summary - db push --sr-tables <table...>: scope --search-replace to specific tables instead of --all-tables (faster on big DBs whose bulk has no URLs) - db backups list / db backups prune --keep/--older-than/--force: manage the ~/db-backup-*.sql.gz files db push leaves behind Extracted waitForHttp to lib/http-ready.ts (shared by sites create + db push). New pure libs (http-ready, db-backups) with unit tests; full suite 341 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Opt-in `db push --incremental` ships only what changed since the last push: diff the dump against a per-site baseline (~/.instawp/baselines/<id>/) and apply a minimal REPLACE/DELETE set (no DROP/CREATE). First run / schema change / --full do a normal full push and refresh the baseline. - Pure, unit-tested engine: lib/db-delta.ts (tuple/INSERT/CREATE parsing, schema fingerprint, PK-keyed diff, prefix remap) + lib/db-baseline.ts (per-site store). - Reuses the existing safety machinery (remote backup, role/cap remap, scoped search-replace, --verify). Requires a per-row dump (mysqldump --skip-extended-insert --order-by-primary; extended dumps rejected). - MVP: single-PK tables delta; no-PK tables or any DDL change auto-fall-back to a full push. The full `db push` path is byte-for-byte unchanged — the incremental branch is skipped entirely unless --incremental/--full is passed. Full suite 382 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
…ngages (#17) Reviewer found --incremental was a correct no-op: schemaFingerprint()'s CREATE TABLE regex was un-anchored, so "CREATE TABLE ... );" text inside row data (posts/postmeta documenting SQL) matched too, lazily sweeping volatile content into the fingerprint → it changed on every data edit → baseline.fingerprint mismatched every run → always "Full push (schema changed)". Anchor both db-delta regexes to line-start (^ + m): real mysqldump DDL is at column 0; per-row INSERTs are single physical lines (newlines escaped), so a data occurrence can't match. Adds an adversarial regression test (row data containing "CREATE TABLE `evil` (...);"). Not AUTO_INCREMENT (already stripped). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Great catch — confirmed and fixed in Fix: anchored both db-delta regexes to line-start ( Regression test added (your suggested adversarial case): a Still live-untested (your point stands): with deltas now engaging, the SSH delta-apply path (REPLACE/DELETE, prefix-remapped + URL-remapped) will execute for the first time on your re-run. So this is GO for re-validation, not for release yet. Over to you: clear the old 181 MB baseline so the fixed fingerprint re-bases cleanly, then re-run And yes — please do log the validation result into the #17 section of the feedback doc; that closes the loop nicely. Thanks for the thorough harness. |
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
Reviewer (round 2): --incremental fell back to full on every real WP site. computeDelta required a single-column PK for EVERY data-bearing table, and wp_term_relationships (composite PK, populated everywhere) tripped the gate unconditionally → "no single-column primary key" every run. Now a table without a single-column PK is row-diffed only if it can be; if it has NO usable PK it's IGNORED when its row set is unchanged (the common case) and only forces a full fallback when it actually changed. Single-PK tables (posts/options/etc.) delta cleanly alongside an unchanged composite-PK table. parseInserts now collects all tables' rows (no PK gating); added sameRowSet for the unchanged-check + a B2 regression test. 388 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
…#17, B3) Round 3: --incremental engaged but OOM'd (~4 GB) on a 166 MB DB — computeDelta loaded BOTH full dumps as strings + per-row Maps for both (~20x blowup). Redesigned the engine (lib/db-delta.ts) to be streaming + manifest-based: - baseline is stored as a compact per-row HASH MANIFEST (PK -> content hash), not the full SQL (lib/db-baseline.ts now persists manifest.json); - the current dump is STREAMED line-by-line (gz-decompressed on the fly) via a line-oriented state machine — which also makes the B1 class of bug structural (a "CREATE TABLE" inside an INSERT line is never treated as DDL); - diffAgainstManifest builds the next manifest as it streams, so re-basing needs no extra pass; composite/no-PK tables are change-detected via an order- independent aggregate (count + sum of row hashes). Peak engine memory is now ~manifest-sized: measured 356 MB on an 860K-row / 72 MB dump (~3x the real DB's single-PK row count), vs the old ~4 GB OOM. db push --incremental still requires a per-row dump (extended-insert rejected). Rewrote db-delta/db-baseline tests for the streaming API (B1, B2, schema-change, composite-change, extended-insert, serialize round-trip). Full suite 380 green. The full db push path remains byte-for-byte unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
Implements the #17 enhancement (site-owner spec):
db push --incrementalships only the row-level delta since the last push instead of the whole DB. Stacked on #6 (base = its branch) since both touchdb.ts; review/merge #6 first, then this retargets tomain.What it does
db push --incrementaldiffs the dump against a per-site baseline (~/.instawp/baselines/<id>/— the last-pushed canonical dump + a schema fingerprint) and applies a minimalREPLACE/DELETEset (noDROP/CREATE).--full→ a normal full push that refreshes the baseline. (DDL changes can't be a row-delta, so they auto-full.)--search-replace,--verify.mysqldump --skip-extended-insert --order-by-primary); an extended-insert dump is rejected with guidance."Without disturbing the core"
The full
db pushpath is byte-for-byte unchanged. Incremental is one early branch (skipped entirely unless--incremental/--fullis set) plus a trailing baseline-save after a successful full (re)base. All the risky logic lives in new pure libs.Safety / MVP scope
lib/db-delta.ts(pure: tuple/INSERT/CREATE parsing, AUTO_INCREMENT-stable schema fingerprint, PK-keyed diff, dump→remote prefix remap) — 14 unit tests.lib/db-baseline.ts— 2 round-trip tests.--no-backup);--fullis the always-works escape hatch;--jsonrequires--force.Known MVP limitations (documented)
--skip-extended-insert; newlines in data are escaped by mysqldump).Tests
Full suite 382 green;
tscclean;db push --helpshows--incremental/--full.🤖 Generated with Claude Code