IxVM ingress: blob-ref index, sidecar→rbtree, IOBuffer interface by arthurpaulino · Pull Request #457 · argumentcomputer/ix

arthurpaulino · 2026-06-29T12:40:32Z

Three commits on top of main that cut ingress FFT cost on heavy shards and land a documented IxVM IOBuffer interface.

Measurement methodology — read first

Shard 51 with unstubbed blake3 currently OOMs at all practical ulimit settings on the dev machine. To get past memory exhaustion and produce a comparative measurement, the kernel's blake3 verification paths in load_verified_constant / load_verified_blob / load_verified_claim were stubbed to no-ops for profiling. The stubs are not part of any committed code in this PR — they were applied locally during measurement and reverted.

What this means for the numbers below:

The delta between two stubbed runs is real for the non-blake3 portion of the trace. Both endpoints elide the same verification work, so the comparison isolates the effect of the kernel changes in this PR.
The absolute stubbed cost (e.g. "93.94G FFT") is not the production shard-51 typecheck cost. Production includes blake3 work (~16–20% of the trace per earlier profiles) that the stubs omit.

Every shard-51 number below is a "blake3-stubbed trace" measurement. Per-const numbers from lake test -- --ignored ixvm (no stubs) are unaffected and reflect real production cost.

Headline

Metric	Before	After	Δ
Shard 51 (blake3-stubbed trace)	119_152_044_062	93_939_967_063	−21.1% on the non-blake3 portion
`Vector.append` (production, unstubbed)	2_661_244_845	2_607_925_682	−2.00%
`Array.append_assoc` (production)	2_588_157_538	2_537_478_644	−1.96%
`_private...extractMainModule._unsafe_rec` (production)	1_091_354_031	1_064_762_809	−2.44%

Per-const wins are end-to-end on the real (unstubbed) kernel; the shard-51 delta is real on the non-blake3 portion of the trace.

Commits

1. `120001d` — IxVM kernel: augment `addr_pos_map` with blob-ref sentinels

lookup_addr_pos's rbtree only contained const addresses. Every blob ref (literal-blob payload pointers in Constant.refs) probed the map, missed, and fell through to the O(N) linear scan over all_addrs which also returned 0. Wasted work proportional to (blob refs × shard closure size).

This commit adds a third value class to the same rbtree: a sentinel 4294967295 (beyond any honest pos+1), inserted by walking each const's refs once via augment_with_blob_refs. lookup_addr_pos becomes a 3-way match: 0 → fall back to linear scan (now only fires under the de-intern soundness corner-case, ~never in practice); SENTINEL → return 0 directly (known blob); pos+1 → return pos.

Shard 51 (blake3-stubbed): 119_152_044_062 → 102_190_540_489 — −14.2% on the non-blake3 portion.

address_eq row count drops from ~10.3M to ~3.4M (−67%); lookup_addr_pos_linear falls out of the top-25 cost table. The augment-walk's added rbtree probes (~430M FFT in the stubbed trace) are paid back many times over by the ~17B FFT saved on lookups.

Soundness: every probe still relies on the positive direction ptr_val equality ⇒ content equality (Aiur Store content-addressing invariant). The linear fallback stays for the (still theoretical) malicious de-intern, where it uses content-based address_eq.

2. `9acbd58` — IxVM ingress: plumbing helpers + sidecar→rbtree migrations + ptr-passing

A 5-reviewer synthesis of ingress accidental complexity drove this consolidated refactor. Three families of change.

Plumbing helpers:

verify_bytes_against(bytes, expected) + bytes_to_addr(bytes) + blake3_flat(input) centralise the 24-line [h[i][j]] digest reshape that was hand-unrolled at 7 sites. ~100 lines of boilerplate deleted.
Channel I/O helpers factor the io_get_info + #read_byte_stream boilerplate at 6 sites. load_verified_blob no longer double-load(addr)s.
sentinel_blob_ref() named const replaces one bare 4294967295.

Sidecar lists → rbtree:

lookup_canon_addr: O(N) parallel-list walk → O(log N) rbtree probe via new build_canon_addr_map. canon_addrs: List<Addr> arg replaced by canon_addr_map: &RBTreeMap<Addr> (one threaded ptr).
lookup_block_start: O(N) parallel-list walk → O(log N) rbtree probe via new build_block_start_map. block_addrs / block_starts args replaced by block_start_map: &RBTreeMap<G> across ~12 fn signatures.
build_aux_recr_ctor_idxs returns (idxs, block_addr) so the standalone-Recr caller doesn't redo the rec_typ_to_inductive_addr + load_verified_constant chain to recover block_addr.
build_ref_idxs_and_blobs fuses the two separate walks (build_ref_idxs_mapped + build_lit_blobs) into one — single rbtree probe per ref produces both ref_idxs and lit_blobs. 5 caller sites simplified.

Pointer-passing (per Aiur's inputSize→width cost model):

convert_univ(u: &Univ): per-row input width drops from 5-variant union to one G column. Caller convert_univ_idxs already produces the &Univ via list_lookup(univs, idx) — no extra store.
convert_one(input: &ConvertInput): caller convert_all already has &input from the ListNode.Cons destructure.

Dead code deleted: address_in_list, apply_ctor_overrides, lookup_override, build_lit_blobs, build_ref_idxs_mapped, is_blob, ctx_convert_expr.

Shard 51 (blake3-stubbed): 102_190_540_489 → 93_939_967_063 — −8.07% on the non-blake3 portion on top of 120001d.

Per-const wins on heavy production targets (unstubbed lake test):

Constant	Before	After	Δ
`Nat.add_comm`	54_369_745	54_049_773	−0.59%
`Nat.sub_le_of_le_add`	515_331_420	510_843_459	−0.87%
`Nat.decLe`	191_471_719	189_723_325	−0.91%
`Array.append_assoc`	2_567_087_893	2_537_360_311	−1.16%
`Vector.append`	2_639_286_078	2_607_800_745	−1.19%
`IxVMInd.Even.rec`	31_659_493	31_434_525	−0.71%
`IxVMInd.Odd.rec`	31_658_598	31_433_622	−0.71%
`String.Internal.append`	718_803_075	708_296_270	−1.46%
`_private...extractMainModule._unsafe_rec`	1_081_617_705	1_064_689_765	−1.57%

lake test -- --ignored ixvm green; 42 pins re-pinned.

3. `9c2eb23` — IxVM: per-kind IOBuffer channels with documented interface

The pre-existing IOBuffer multiplexed claim bytes, assumption tree bytes, constant wire bytes, and a per-blob-addr empty-marker on a single channel 0, distinguished only by blake3 content-hash collision-impossibility. A new contributor reading load_payload_const(key) couldn't tell from the helper whether they were getting a const, a claim, or a tree — only the caller knew the intent.

This commit splits channel 0 into per-kind channels and lands a documented IxVM IOBuffer interface block (mirrored on both Aiur and Lean sides).

Tier	Channel	Purpose	Key	Value
Ctrl	0	claim wire bytes	`blake3(claim_bytes)`	claim bytes
Ctrl	1	assumption tree bytes	`tree.root`	tree bytes
Const	2	constant wire bytes	const addr	const bytes
Const	3	Defn reducibility hint	Defn addr	single G
Blob	4	blob discriminator	addr	one byte (1=const, 0=blob)
Blob	5	blob raw bytes	blob addr	raw bytes

Tier 1 fires once per verify_claim. Tier 2 fires per constant traversed during load_with_deps. Tier 3 fires per blob ref encountered during build_ref_idxs_and_blobs.

The blob discriminator (ch 4) replaces the previous io_get_info(0, addr).len == 0 ⟹ blob hack inside load_with_deps with an explicit per-addr one-byte payload. No more len-as-meaning overload.

Every per-channel io_get_info + #read_byte_stream pair is inlined at the single consumer of each channel — no fn-call row gets added for the reorganization. Channel reorganization is cost-neutral by design, confirmed by measurement: per-const pin shifts vs 9acbd58 land within ±0.010% (noise-tier; the ch-4 discriminator's one read_byte trades against the eliminated io_get_info(0, addr).len check).

The win here is review-time clarity, not FFT — io_get_info(channel, key) now has one value shape per channel and a doc block both kernel and harness reference.

Soundness: ch 0/1/2/5 byte streams are blake3-verified by the kernel against their content-addressed keys. ch 3 hint is semantically optional (controls WHNF reduction heuristic only; def-eq is sound either way). ch 4 discriminator is sound by erasure-correctness — a lying byte flips the const/blob decision and the wrong-path load downstream fails (a "const" blob triggers a ch 2 read that returns empty → blake3 verify against the non-empty addr fails; a "blob" const dangles references → typecheck fail).

Cumulative measurement (vs `main`)

Constant	Before	After	Δ
`Nat.add_comm`	54_504_714 (approx.)	54_047_004	~−0.84%
`Nat.decLe`	191_354_793	189_722_204	−0.85%
`Nat.sub_le_of_le_add`	515_056_158	510_870_092	−0.81%
`Array.append_assoc`	2_588_157_538	2_537_478_644	−1.96%
`Vector.append`	2_661_244_845	2_607_925_682	−2.00%
`String.Internal.append`	725_415_461	708_332_683	−2.36%
`_private...extractMainModule._unsafe_rec`	1_091_354_031	1_064_762_809	−2.44%
Shard 51 (blake3-stubbed trace, non-blake3 portion)	119_152_044_062	93_939_967_063	−21.1%

Test plan

lake build clean
lake test -- --ignored ixvm green at every commit
Shard 51 comparative measurement with blake3 stubs applied locally to both endpoints (stubs reverted; not part of any committed code in this PR)

Replace `klimbs_mul(base, klimbs_pow(base, klimbs_dec(exp)))`'s O(exp) per-step memo growth with a binary-exponentiation recursion of depth O(log exp). Both `exp / 2` and `exp % 2` route through the native `unconstrained_big_uint_div_mod` op, so each level pays O(1) for the division — `log2(exp)` memo entries for `klimbs_pow` itself + the `klimbs_mul(base, base)` doublings. `lake test -- --ignored ixvm` on `IxVMPrim.nat_pow_big` (`(2 ^ 16384 : Nat) - 2 ^ 16384 = 0`, measured against the current `ap/utf8-mod-witness` baseline): 3_941_276_885 → 71_856_587 FFT (-98.2%). Tests/Ix/IxVM.lean: * new `IxVMPrim.nat_pow_big` synthetic — exercises `klimbs_pow` at a non-trivial exponent (`2 ^ 16384`) so future regressions in the body shift its FFT pin. * small pin shifts in `nat_shl_lit`, `nat_shr_lit`, `nat_div_lit`, `nat_mod_lit`, `nat_gcd_lit`, `bv_to_nat_lit` — all reduce via `klimbs_pow` / `klimbs_div_mod` and absorb the new cost shape.

`lookup_addr_pos`'s rbtree (`addr_pos_map`) only contained const addresses. Every blob ref (literal-blob payload pointers in `Constant.refs`) probed the map, missed, and fell through to the O(N) linear scan over `all_addrs` which also returned 0. Wasted work proportional to (blob refs × shard closure size). Add a third value class to the same rbtree: the sentinel `4294967295` (beyond any honest `pos+1`), inserted by walking each const's `refs` once via `augment_with_blob_refs`. Any ref not already mapped to a const-position gets the sentinel. `lookup_addr_pos` becomes a 3-way match: `0` → fall back to linear scan (now only fires under the de-intern soundness corner-case, ~never in practice); `SENTINEL` → return 0 directly (known blob); `pos+1` → return `pos`. `is_blob` mirrors the same match. Soundness preserved: every probe still relies on the positive direction `ptr_val` equality ⇒ content equality (Aiur Store's content-addressing invariant). The linear fallback stays put for the (still theoretical) malicious de-intern, where it uses content-based `address_eq`. `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 51` (`ulimit -v 23000000`, with the kernel's blake3 paths in `load_verified_constant` / `load_verified_blob` / `load_verified_claim` stubbed for profiling): 119_152_044_062 → 102_190_540_489 FFT (-14.2%). Tests/Ix/IxVM.lean: 42 pin shifts absorbing the per-ingress augment-walk overhead.

A consolidated refactor of the ingress pipeline guided by a 5-reviewer synthesis of accidental complexity sites. Three families of change: PLUMBING * `Blake3.lean`: `verify_bytes_against(bytes, expected)` + `bytes_to_addr(bytes)` + `blake3_flat(input)` centralise the 24-line `[h[i][j]]` digest reshape that was hand-unrolled at 7 sites. ~100 lines of boilerplate deleted. * `Ingress.lean`: `load_payload_const`/`blob`/`hint` + `ch_const`/`blob`/`hint` factor the `io_get_info` + `#read_byte_stream` boilerplate at 6 sites. `load_verified_blob` no longer double-`load(addr)`s. * `sentinel_blob_ref()` named const replaces one bare `4294967295`. SIDECAR LISTS → RBTREE * `lookup_canon_addr`: O(N) parallel-list walk → O(log N) rbtree probe via new `build_canon_addr_map`. `canon_addrs: List<Addr>` arg replaced by `canon_addr_map: &RBTreeMap<Addr>` (one threaded ptr). * `lookup_block_start`: O(N) parallel-list walk → O(log N) rbtree probe via new `build_block_start_map`. `block_addrs` / `block_starts` args replaced by `block_start_map: &RBTreeMap<G>` across ~12 fn signatures. * `build_aux_recr_ctor_idxs` returns `(idxs, block_addr)` so the standalone-Recr caller in `build_convert_inputs_walk` doesn't redo the `rec_typ_to_inductive_addr` + `load_verified_constant` chain to recover `block_addr` — `build_aux_recr_ctor_idxs` already computed it internally. * `build_ref_idxs_and_blobs` fuses the two separate walks (`build_ref_idxs_mapped` + `build_lit_blobs`) into one — single rbtree probe per ref produces both `ref_idxs` and `lit_blobs`. 5 caller sites simplified. PTR-PASSING (per `reference_aiur_pass_pointer_not_value`) * `convert_univ(u: &Univ)`: per-row input width drops from 5-variant union to one G column. Caller `convert_univ_idxs` already produces the `&Univ` via `list_lookup(univs, idx)` — no extra `store`. * `convert_one(input: &ConvertInput)`: caller `convert_all` already has `&input` from the `ListNode.Cons` destructure. DEAD CODE DELETED `address_in_list`, `apply_ctor_overrides`, `lookup_override`, `build_lit_blobs`, `build_ref_idxs_mapped`, `is_blob`, `ctx_convert_expr` — all unused after the migrations above. TRIED + REVERTED (left as notes in code where instructive): * Fused `build_addr_pos_map` + `augment_with_blob_refs` single-pass. Lost to Aiur's per-row width tax — two narrower passes cheaper than one merged wide pass. * `convert_expr(ctx: &ConvertCtx)` ptr. Per-arm `load(ctx)` cost > raw-arg threading. * `compute_layout_walk` seen-mptrs/seen-poses → rbtree. Small regression on typical shards (few enough Muts blocks that O(N) list scan is already cheap; rbtree value column tax > savings). * `block_members_map` to skip per-projection `load_verified_constant`. Lost because Aiur memoises `load_verified_constant` automatically — the new rbtree probe's width tax exceeded the savings from the eliminated match cascade. * `ctx` everywhere as `&ConvertCtx` (with `store(...)` at 5 construction sites). The added `store` rows offset the per-row savings — wash. `lake test -- --ignored ixvm`: 42 pin shifts, all measurable wins vs the `ap/blob-ref-rbtree-augment` tip. Cumulative on heavy consts: * `Nat.add_comm`: 54_369_745 → 54_049_773 FFT (-0.59%) * `Nat.sub_le_of_le_add`: 515_331_420 → 510_843_459 FFT (-0.87%) * `Array.append_assoc`: 2_567_087_893 → 2_537_360_311 FFT (-1.16%) * `Vector.append`: 2_639_286_078 → 2_607_800_745 FFT (-1.19%) * `String.Internal.append`: 718_803_075 → 708_296_270 FFT (-1.46%) * `IxVMInd.Even.rec`: 31_659_493 → 31_434_525 FFT (-0.71%) * `_private....extractMainModule._unsafe_rec`: 1_081_617_705 → 1_064_689_765 FFT (-1.57%)

Replaces channel 0's tagged-union (claim / tree / const bytes / empty- blob-marker, distinguished only by key content-hash collision- impossibility) with one channel per value kind. Channel reorganization is cost-neutral — all six per-channel `io_get_info` + `#read_byte_stream` pairs are inlined at the single consumer of each channel, so no fn-call row gets added. New layout, tiered by access pattern: | Tier | Channel | Purpose | Key | Value | |-------|---------|--------------------------|-------------------------|----------------------| | Ctrl | 0 | claim wire bytes | `blake3(claim_bytes)` | claim bytes | | Ctrl | 1 | assumption tree bytes | `tree.root` | tree bytes | | Const | 2 | constant wire bytes | const addr | const bytes | | Const | 3 | Defn reducibility hint | Defn addr | single G | | Blob | 4 | blob discriminator | addr | one byte (1=const, 0=blob) | | Blob | 5 | blob raw bytes | blob addr | raw bytes | Tier 1 = one-per-`verify_claim`. Tier 2 = per-const. Tier 3 = per-blob. `io_get_info(channel, key)` is now unambiguous by channel alone. Aiur side (`Ingress.lean`, `Kernel/Claim.lean`): * `load_verified_constant` inlines `io_get_info(2, raw)` + `#read_byte_stream(2, ...)` (was ch 0). * `load_verified_blob` inlines ch 5 read (was ch 1). * `load_constant_hint` inlines ch 3 read (was ch 2). * `load_with_deps` inlines a ch 4 discriminator probe replacing the former `io_get_info(0, addr).len == 0 ⟹ blob` hack on ch 0. One read_byte for the per-addr discriminator byte. * `load_verified_claim` inlines ch 0 (was using a shared `load_payload_const` helper on ch 0). * `load_assumption_tree` inlines ch 1 (was using same `load_payload_const` helper on ch 0). * Helper wrappers (`load_payload_claim/tree/const/hint/blob`, `load_discriminator`, `ch_*` constants) all removed — every channel has exactly one consumer site and the inlined dispatch costs no fn call. * Top-of-file `IxVM IOBuffer interface` doc block documents the layout + soundness model. Lean side (`ClaimHarness.lean`): * `addEntries` writes per channel — consts → ch 2 + per-addr discriminator `[1]` on ch 4; blobs → ch 5 + per-addr discriminator `[0]` on ch 4; Defn hints → ch 3. * `seedTreeAt` writes tree bytes to ch 1 (was ch 0). * `buildClaimWitness` writes claim bytes to ch 0 (unchanged). * Drops the empty-marker write on ch 0 for blob addrs — the ch 4 discriminator covers blob/const classification explicitly. * Top-of-section `IxVM IOBuffer interface` doc block mirrors the Aiur-side comment. Soundness model unchanged. ch 0/1/2/5 byte streams are blake3-verified by the kernel against their content-addressed keys. ch 3 hint is semantically optional (controls WHNF reduction heuristic only; def-eq is sound either way). ch 4 discriminator is sound by erasure- correctness — a lying byte flips the const/blob decision and the wrong-path load downstream fails (a "const" blob triggers a ch 2 read that returns empty → blake3 verify against the non-empty addr fails; a "blob" const dangles references → typecheck fail). `lake test -- --ignored ixvm` green; 42 pins shifted within ±0.010% (noise-tier) vs the pre-shuffle baseline — channel reorganization is cost-neutral as required, the discriminator's one read_byte trades against the eliminated `io_get_info(0, ...).len` check.

arthurpaulino added 4 commits June 30, 2026 12:09

arthurpaulino force-pushed the ap/ingress-refactor branch from 9c2eb23 to ee4da06 Compare June 30, 2026 19:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IxVM ingress: blob-ref index, sidecar→rbtree, IOBuffer interface#457

IxVM ingress: blob-ref index, sidecar→rbtree, IOBuffer interface#457
arthurpaulino wants to merge 4 commits into
mainfrom
ap/ingress-refactor

arthurpaulino commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

arthurpaulino commented Jun 29, 2026

Measurement methodology — read first

Headline

Commits

1. 120001d — IxVM kernel: augment addr_pos_map with blob-ref sentinels

2. 9acbd58 — IxVM ingress: plumbing helpers + sidecar→rbtree migrations + ptr-passing

3. 9c2eb23 — IxVM: per-kind IOBuffer channels with documented interface

Cumulative measurement (vs main)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `120001d` — IxVM kernel: augment `addr_pos_map` with blob-ref sentinels

2. `9acbd58` — IxVM ingress: plumbing helpers + sidecar→rbtree migrations + ptr-passing

3. `9c2eb23` — IxVM: per-kind IOBuffer channels with documented interface

Cumulative measurement (vs `main`)