Skip to content

feat: incremental db push (baseline + row-delta) — #17#7

Merged
vikasiwp merged 5 commits into
mainfrom
fix/cli-db-incremental-17
Jun 26, 2026
Merged

feat: incremental db push (baseline + row-delta) — #17#7
vikasiwp merged 5 commits into
mainfrom
fix/cli-db-incremental-17

Conversation

@vikasiwp

Copy link
Copy Markdown
Contributor

Implements the #17 enhancement (site-owner spec): db push --incremental ships only the row-level delta since the last push instead of the whole DB. Stacked on #6 (base = its branch) since both touch db.ts; review/merge #6 first, then this retargets to main.

What it does

  • db push --incremental diffs the dump against a per-site baseline (~/.instawp/baselines/<id>/ — the last-pushed canonical dump + a schema fingerprint) and applies a minimal REPLACE/DELETE set (no DROP/CREATE).
  • First run / schema (DDL) change / --full → a normal full push that refreshes the baseline. (DDL changes can't be a row-delta, so they auto-full.)
  • Reuses the existing safety machinery: remote backup first, prefix + role/capability-key remap, scoped URL --search-replace, --verify.
  • Requires a per-row dump (mysqldump --skip-extended-insert --order-by-primary); an extended-insert dump is rejected with guidance.

"Without disturbing the core"

The full db push path is byte-for-byte unchanged. Incremental is one early branch (skipped entirely unless --incremental/--full is set) plus a trailing baseline-save after a successful full (re)base. All the risky logic lives in new pure libs.

Safety / MVP scope

  • Data-mutating logic is isolated in lib/db-delta.ts (pure: tuple/INSERT/CREATE parsing, AUTO_INCREMENT-stable schema fingerprint, PK-keyed diff, dump→remote prefix remap) — 14 unit tests.
  • Baseline store lib/db-baseline.ts — 2 round-trip tests.
  • MVP: only single-column-PK tables get a delta; a table without one — or any DDL change — auto-falls-back to a full push (safe default). Per-table whole-push for PK-less tables is a future optimization.
  • Remote backup is always taken before applying the delta (unless --no-backup); --full is the always-works escape hatch; --json requires --force.

Known MVP limitations (documented)

  • Loads both dumps into memory to diff (local export+diff stays full-cost, as the spec expects; the win is transfer/import).
  • Diff keys on byte-equal row tuples (deterministic from mysqldump) and assumes one INSERT per line (true for --skip-extended-insert; newlines in data are escaped by mysqldump).

Tests

Full suite 382 green; tsc clean; db push --help shows --incremental/--full.

🤖 Generated with Claude Code

vikasprogrammer and others added 2 commits June 25, 2026 20:06
Closes the round-3 large-DB grab-bag:
- db push --verify: poll the site URL after import until it answers HTTP
  (large imports can briefly return 000 right after import/flush); reported
  as `verified` in the summary
- db push --sr-tables <table...>: scope --search-replace to specific tables
  instead of --all-tables (faster on big DBs whose bulk has no URLs)
- db backups list / db backups prune --keep/--older-than/--force: manage the
  ~/db-backup-*.sql.gz files db push leaves behind

Extracted waitForHttp to lib/http-ready.ts (shared by sites create + db push).
New pure libs (http-ready, db-backups) with unit tests; full suite 341 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Opt-in `db push --incremental` ships only what changed since the last push:
diff the dump against a per-site baseline (~/.instawp/baselines/<id>/) and apply
a minimal REPLACE/DELETE set (no DROP/CREATE). First run / schema change / --full
do a normal full push and refresh the baseline.

- Pure, unit-tested engine: lib/db-delta.ts (tuple/INSERT/CREATE parsing, schema
  fingerprint, PK-keyed diff, prefix remap) + lib/db-baseline.ts (per-site store).
- Reuses the existing safety machinery (remote backup, role/cap remap, scoped
  search-replace, --verify). Requires a per-row dump
  (mysqldump --skip-extended-insert --order-by-primary; extended dumps rejected).
- MVP: single-PK tables delta; no-PK tables or any DDL change auto-fall-back to
  a full push.

The full `db push` path is byte-for-byte unchanged — the incremental branch is
skipped entirely unless --incremental/--full is passed. Full suite 382 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@claude

claude Bot commented Jun 25, 2026

Copy link
Copy Markdown

Claude encountered an error —— View job


I'll analyze this and get back to you.

…ngages (#17)

Reviewer found --incremental was a correct no-op: schemaFingerprint()'s CREATE
TABLE regex was un-anchored, so "CREATE TABLE ... );" text inside row data
(posts/postmeta documenting SQL) matched too, lazily sweeping volatile content
into the fingerprint → it changed on every data edit → baseline.fingerprint
mismatched every run → always "Full push (schema changed)".

Anchor both db-delta regexes to line-start (^ + m): real mysqldump DDL is at
column 0; per-row INSERTs are single physical lines (newlines escaped), so a
data occurrence can't match. Adds an adversarial regression test (row data
containing "CREATE TABLE `evil` (...);"). Not AUTO_INCREMENT (already stripped).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vikasprogrammer

Copy link
Copy Markdown
Contributor

Great catch — confirmed and fixed in e94c9fb. Root cause was exactly as you pinpointed: schemaFingerprint()'s CREATE TABLE regex (db-delta.ts) was un-anchored, so the "CREATE TABLE … );" text living in 17 posts + 14 postmeta rows matched too and lazily swept volatile row content into the fingerprint → it changed every run → baseline.fingerprint !== fingerprint was always true → "Full push (schema changed)" every time. (Not AUTO_INCREMENT — that was already stripped.)

Fix: anchored both db-delta regexes to line-start (^ + m flag) — schemaFingerprint() and parseCreateTables() (same latent flaw). Real mysqldump DDL is at column 0; per-row INSERTs are single physical lines (newlines escaped as \n text), so a data occurrence can't match.

Regression test added (your suggested adversarial case): a wp_posts row whose post_content contains CREATE TABLE `evil` (x int);, with a data-only change between baseline and current. Two assertions: fingerprint is stable across the data change, and computeDelta returns mode: 'delta' (1 REPLACE) rather than a full fallback. Both fail on the old regex, pass on the fix. Full suite 386 green.

Still live-untested (your point stands): with deltas now engaging, the SSH delta-apply path (REPLACE/DELETE, prefix-remapped + URL-remapped) will execute for the first time on your re-run. So this is GO for re-validation, not for release yet.

Over to you: clear the old 181 MB baseline so the fixed fingerprint re-bases cleanly, then re-run /tmp/iwp-incr-test.sh against instawp-marketing. If deltas both engage (no "schema changed" on an unchanged-schema cycle) and apply clean (UPDATE/INSERT/DELETE land, pages 200), it's a GO for beta.26.

And yes — please do log the validation result into the #17 section of the feedback doc; that closes the loop nicely. Thanks for the thorough harness.

@claude

claude Bot commented Jun 25, 2026

Copy link
Copy Markdown

Claude encountered an error —— View job


I'll analyze this and get back to you.

Reviewer (round 2): --incremental fell back to full on every real WP site.
computeDelta required a single-column PK for EVERY data-bearing table, and
wp_term_relationships (composite PK, populated everywhere) tripped the gate
unconditionally → "no single-column primary key" every run.

Now a table without a single-column PK is row-diffed only if it can be; if it
has NO usable PK it's IGNORED when its row set is unchanged (the common case)
and only forces a full fallback when it actually changed. Single-PK tables
(posts/options/etc.) delta cleanly alongside an unchanged composite-PK table.

parseInserts now collects all tables' rows (no PK gating); added sameRowSet for
the unchanged-check + a B2 regression test. 388 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@claude

claude Bot commented Jun 25, 2026

Copy link
Copy Markdown

Claude encountered an error —— View job


I'll analyze this and get back to you.

…#17, B3)

Round 3: --incremental engaged but OOM'd (~4 GB) on a 166 MB DB — computeDelta
loaded BOTH full dumps as strings + per-row Maps for both (~20x blowup).

Redesigned the engine (lib/db-delta.ts) to be streaming + manifest-based:
- baseline is stored as a compact per-row HASH MANIFEST (PK -> content hash),
  not the full SQL (lib/db-baseline.ts now persists manifest.json);
- the current dump is STREAMED line-by-line (gz-decompressed on the fly) via a
  line-oriented state machine — which also makes the B1 class of bug structural
  (a "CREATE TABLE" inside an INSERT line is never treated as DDL);
- diffAgainstManifest builds the next manifest as it streams, so re-basing needs
  no extra pass; composite/no-PK tables are change-detected via an order-
  independent aggregate (count + sum of row hashes).

Peak engine memory is now ~manifest-sized: measured 356 MB on an 860K-row /
72 MB dump (~3x the real DB's single-PK row count), vs the old ~4 GB OOM.
db push --incremental still requires a per-row dump (extended-insert rejected).

Rewrote db-delta/db-baseline tests for the streaming API (B1, B2, schema-change,
composite-change, extended-insert, serialize round-trip). Full suite 380 green.
The full db push path remains byte-for-byte unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@claude

claude Bot commented Jun 26, 2026

Copy link
Copy Markdown

Claude encountered an error —— View job


I'll analyze this and get back to you.

@vikasiwp vikasiwp changed the base branch from fix/cli-db-16-verify-srtables-backups to main June 26, 2026 01:30
@vikasiwp vikasiwp merged commit 666bbb5 into main Jun 26, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants