IxVM ingress: blob-ref index, sidecar→rbtree, IOBuffer interface#457
Draft
arthurpaulino wants to merge 4 commits into
Draft
IxVM ingress: blob-ref index, sidecar→rbtree, IOBuffer interface#457arthurpaulino wants to merge 4 commits into
arthurpaulino wants to merge 4 commits into
Conversation
Replace `klimbs_mul(base, klimbs_pow(base, klimbs_dec(exp)))`'s O(exp) per-step memo growth with a binary-exponentiation recursion of depth O(log exp). Both `exp / 2` and `exp % 2` route through the native `unconstrained_big_uint_div_mod` op, so each level pays O(1) for the division — `log2(exp)` memo entries for `klimbs_pow` itself + the `klimbs_mul(base, base)` doublings. `lake test -- --ignored ixvm` on `IxVMPrim.nat_pow_big` (`(2 ^ 16384 : Nat) - 2 ^ 16384 = 0`, measured against the current `ap/utf8-mod-witness` baseline): 3_941_276_885 → 71_856_587 FFT (-98.2%). Tests/Ix/IxVM.lean: * new `IxVMPrim.nat_pow_big` synthetic — exercises `klimbs_pow` at a non-trivial exponent (`2 ^ 16384`) so future regressions in the body shift its FFT pin. * small pin shifts in `nat_shl_lit`, `nat_shr_lit`, `nat_div_lit`, `nat_mod_lit`, `nat_gcd_lit`, `bv_to_nat_lit` — all reduce via `klimbs_pow` / `klimbs_div_mod` and absorb the new cost shape.
`lookup_addr_pos`'s rbtree (`addr_pos_map`) only contained const addresses. Every blob ref (literal-blob payload pointers in `Constant.refs`) probed the map, missed, and fell through to the O(N) linear scan over `all_addrs` which also returned 0. Wasted work proportional to (blob refs × shard closure size). Add a third value class to the same rbtree: the sentinel `4294967295` (beyond any honest `pos+1`), inserted by walking each const's `refs` once via `augment_with_blob_refs`. Any ref not already mapped to a const-position gets the sentinel. `lookup_addr_pos` becomes a 3-way match: `0` → fall back to linear scan (now only fires under the de-intern soundness corner-case, ~never in practice); `SENTINEL` → return 0 directly (known blob); `pos+1` → return `pos`. `is_blob` mirrors the same match. Soundness preserved: every probe still relies on the positive direction `ptr_val` equality ⇒ content equality (Aiur Store's content-addressing invariant). The linear fallback stays put for the (still theoretical) malicious de-intern, where it uses content-based `address_eq`. `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 51` (`ulimit -v 23000000`, with the kernel's blake3 paths in `load_verified_constant` / `load_verified_blob` / `load_verified_claim` stubbed for profiling): 119_152_044_062 → 102_190_540_489 FFT (-14.2%). Tests/Ix/IxVM.lean: 42 pin shifts absorbing the per-ingress augment-walk overhead.
A consolidated refactor of the ingress pipeline guided by a 5-reviewer synthesis of accidental complexity sites. Three families of change: PLUMBING * `Blake3.lean`: `verify_bytes_against(bytes, expected)` + `bytes_to_addr(bytes)` + `blake3_flat(input)` centralise the 24-line `[h[i][j]]` digest reshape that was hand-unrolled at 7 sites. ~100 lines of boilerplate deleted. * `Ingress.lean`: `load_payload_const`/`blob`/`hint` + `ch_const`/`blob`/`hint` factor the `io_get_info` + `#read_byte_stream` boilerplate at 6 sites. `load_verified_blob` no longer double-`load(addr)`s. * `sentinel_blob_ref()` named const replaces one bare `4294967295`. SIDECAR LISTS → RBTREE * `lookup_canon_addr`: O(N) parallel-list walk → O(log N) rbtree probe via new `build_canon_addr_map`. `canon_addrs: List<Addr>` arg replaced by `canon_addr_map: &RBTreeMap<Addr>` (one threaded ptr). * `lookup_block_start`: O(N) parallel-list walk → O(log N) rbtree probe via new `build_block_start_map`. `block_addrs` / `block_starts` args replaced by `block_start_map: &RBTreeMap<G>` across ~12 fn signatures. * `build_aux_recr_ctor_idxs` returns `(idxs, block_addr)` so the standalone-Recr caller in `build_convert_inputs_walk` doesn't redo the `rec_typ_to_inductive_addr` + `load_verified_constant` chain to recover `block_addr` — `build_aux_recr_ctor_idxs` already computed it internally. * `build_ref_idxs_and_blobs` fuses the two separate walks (`build_ref_idxs_mapped` + `build_lit_blobs`) into one — single rbtree probe per ref produces both `ref_idxs` and `lit_blobs`. 5 caller sites simplified. PTR-PASSING (per `reference_aiur_pass_pointer_not_value`) * `convert_univ(u: &Univ)`: per-row input width drops from 5-variant union to one G column. Caller `convert_univ_idxs` already produces the `&Univ` via `list_lookup(univs, idx)` — no extra `store`. * `convert_one(input: &ConvertInput)`: caller `convert_all` already has `&input` from the `ListNode.Cons` destructure. DEAD CODE DELETED `address_in_list`, `apply_ctor_overrides`, `lookup_override`, `build_lit_blobs`, `build_ref_idxs_mapped`, `is_blob`, `ctx_convert_expr` — all unused after the migrations above. TRIED + REVERTED (left as notes in code where instructive): * Fused `build_addr_pos_map` + `augment_with_blob_refs` single-pass. Lost to Aiur's per-row width tax — two narrower passes cheaper than one merged wide pass. * `convert_expr(ctx: &ConvertCtx)` ptr. Per-arm `load(ctx)` cost > raw-arg threading. * `compute_layout_walk` seen-mptrs/seen-poses → rbtree. Small regression on typical shards (few enough Muts blocks that O(N) list scan is already cheap; rbtree value column tax > savings). * `block_members_map` to skip per-projection `load_verified_constant`. Lost because Aiur memoises `load_verified_constant` automatically — the new rbtree probe's width tax exceeded the savings from the eliminated match cascade. * `ctx` everywhere as `&ConvertCtx` (with `store(...)` at 5 construction sites). The added `store` rows offset the per-row savings — wash. `lake test -- --ignored ixvm`: 42 pin shifts, all measurable wins vs the `ap/blob-ref-rbtree-augment` tip. Cumulative on heavy consts: * `Nat.add_comm`: 54_369_745 → 54_049_773 FFT (-0.59%) * `Nat.sub_le_of_le_add`: 515_331_420 → 510_843_459 FFT (-0.87%) * `Array.append_assoc`: 2_567_087_893 → 2_537_360_311 FFT (-1.16%) * `Vector.append`: 2_639_286_078 → 2_607_800_745 FFT (-1.19%) * `String.Internal.append`: 718_803_075 → 708_296_270 FFT (-1.46%) * `IxVMInd.Even.rec`: 31_659_493 → 31_434_525 FFT (-0.71%) * `_private....extractMainModule._unsafe_rec`: 1_081_617_705 → 1_064_689_765 FFT (-1.57%)
Replaces channel 0's tagged-union (claim / tree / const bytes / empty- blob-marker, distinguished only by key content-hash collision- impossibility) with one channel per value kind. Channel reorganization is cost-neutral — all six per-channel `io_get_info` + `#read_byte_stream` pairs are inlined at the single consumer of each channel, so no fn-call row gets added. New layout, tiered by access pattern: | Tier | Channel | Purpose | Key | Value | |-------|---------|--------------------------|-------------------------|----------------------| | Ctrl | 0 | claim wire bytes | `blake3(claim_bytes)` | claim bytes | | Ctrl | 1 | assumption tree bytes | `tree.root` | tree bytes | | Const | 2 | constant wire bytes | const addr | const bytes | | Const | 3 | Defn reducibility hint | Defn addr | single G | | Blob | 4 | blob discriminator | addr | one byte (1=const, 0=blob) | | Blob | 5 | blob raw bytes | blob addr | raw bytes | Tier 1 = one-per-`verify_claim`. Tier 2 = per-const. Tier 3 = per-blob. `io_get_info(channel, key)` is now unambiguous by channel alone. Aiur side (`Ingress.lean`, `Kernel/Claim.lean`): * `load_verified_constant` inlines `io_get_info(2, raw)` + `#read_byte_stream(2, ...)` (was ch 0). * `load_verified_blob` inlines ch 5 read (was ch 1). * `load_constant_hint` inlines ch 3 read (was ch 2). * `load_with_deps` inlines a ch 4 discriminator probe replacing the former `io_get_info(0, addr).len == 0 ⟹ blob` hack on ch 0. One read_byte for the per-addr discriminator byte. * `load_verified_claim` inlines ch 0 (was using a shared `load_payload_const` helper on ch 0). * `load_assumption_tree` inlines ch 1 (was using same `load_payload_const` helper on ch 0). * Helper wrappers (`load_payload_claim/tree/const/hint/blob`, `load_discriminator`, `ch_*` constants) all removed — every channel has exactly one consumer site and the inlined dispatch costs no fn call. * Top-of-file `IxVM IOBuffer interface` doc block documents the layout + soundness model. Lean side (`ClaimHarness.lean`): * `addEntries` writes per channel — consts → ch 2 + per-addr discriminator `[1]` on ch 4; blobs → ch 5 + per-addr discriminator `[0]` on ch 4; Defn hints → ch 3. * `seedTreeAt` writes tree bytes to ch 1 (was ch 0). * `buildClaimWitness` writes claim bytes to ch 0 (unchanged). * Drops the empty-marker write on ch 0 for blob addrs — the ch 4 discriminator covers blob/const classification explicitly. * Top-of-section `IxVM IOBuffer interface` doc block mirrors the Aiur-side comment. Soundness model unchanged. ch 0/1/2/5 byte streams are blake3-verified by the kernel against their content-addressed keys. ch 3 hint is semantically optional (controls WHNF reduction heuristic only; def-eq is sound either way). ch 4 discriminator is sound by erasure- correctness — a lying byte flips the const/blob decision and the wrong-path load downstream fails (a "const" blob triggers a ch 2 read that returns empty → blake3 verify against the non-empty addr fails; a "blob" const dangles references → typecheck fail). `lake test -- --ignored ixvm` green; 42 pins shifted within ±0.010% (noise-tier) vs the pre-shuffle baseline — channel reorganization is cost-neutral as required, the discriminator's one read_byte trades against the eliminated `io_get_info(0, ...).len` check.
9c2eb23 to
ee4da06
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Three commits on top of
mainthat cut ingress FFT cost on heavy shards and land a documented IxVM IOBuffer interface.Measurement methodology — read first
Shard 51 with unstubbed blake3 currently OOMs at all practical
ulimitsettings on the dev machine. To get past memory exhaustion and produce a comparative measurement, the kernel's blake3 verification paths inload_verified_constant/load_verified_blob/load_verified_claimwere stubbed to no-ops for profiling. The stubs are not part of any committed code in this PR — they were applied locally during measurement and reverted.What this means for the numbers below:
Every shard-51 number below is a "blake3-stubbed trace" measurement. Per-const numbers from
lake test -- --ignored ixvm(no stubs) are unaffected and reflect real production cost.Headline
Vector.append(production, unstubbed)Array.append_assoc(production)_private...extractMainModule._unsafe_rec(production)Per-const wins are end-to-end on the real (unstubbed) kernel; the shard-51 delta is real on the non-blake3 portion of the trace.
Commits
1.
120001d— IxVM kernel: augmentaddr_pos_mapwith blob-ref sentinelslookup_addr_pos's rbtree only contained const addresses. Every blob ref (literal-blob payload pointers inConstant.refs) probed the map, missed, and fell through to the O(N) linear scan overall_addrswhich also returned 0. Wasted work proportional to (blob refs × shard closure size).This commit adds a third value class to the same rbtree: a sentinel
4294967295(beyond any honestpos+1), inserted by walking each const'srefsonce viaaugment_with_blob_refs.lookup_addr_posbecomes a 3-way match:0→ fall back to linear scan (now only fires under the de-intern soundness corner-case, ~never in practice);SENTINEL→ return 0 directly (known blob);pos+1→ returnpos.Shard 51 (blake3-stubbed): 119_152_044_062 → 102_190_540_489 — −14.2% on the non-blake3 portion.
address_eqrow count drops from ~10.3M to ~3.4M (−67%);lookup_addr_pos_linearfalls out of the top-25 cost table. The augment-walk's added rbtree probes (~430M FFT in the stubbed trace) are paid back many times over by the ~17B FFT saved on lookups.Soundness: every probe still relies on the positive direction
ptr_valequality ⇒ content equality (Aiur Store content-addressing invariant). The linear fallback stays for the (still theoretical) malicious de-intern, where it uses content-basedaddress_eq.2.
9acbd58— IxVM ingress: plumbing helpers + sidecar→rbtree migrations + ptr-passingA 5-reviewer synthesis of ingress accidental complexity drove this consolidated refactor. Three families of change.
Plumbing helpers:
verify_bytes_against(bytes, expected)+bytes_to_addr(bytes)+blake3_flat(input)centralise the 24-line[h[i][j]]digest reshape that was hand-unrolled at 7 sites. ~100 lines of boilerplate deleted.io_get_info+#read_byte_streamboilerplate at 6 sites.load_verified_blobno longer double-load(addr)s.sentinel_blob_ref()named const replaces one bare4294967295.Sidecar lists → rbtree:
lookup_canon_addr: O(N) parallel-list walk → O(log N) rbtree probe via newbuild_canon_addr_map.canon_addrs: List<Addr>arg replaced bycanon_addr_map: &RBTreeMap<Addr>(one threaded ptr).lookup_block_start: O(N) parallel-list walk → O(log N) rbtree probe via newbuild_block_start_map.block_addrs/block_startsargs replaced byblock_start_map: &RBTreeMap<G>across ~12 fn signatures.build_aux_recr_ctor_idxsreturns(idxs, block_addr)so the standalone-Recr caller doesn't redo therec_typ_to_inductive_addr+load_verified_constantchain to recoverblock_addr.build_ref_idxs_and_blobsfuses the two separate walks (build_ref_idxs_mapped+build_lit_blobs) into one — single rbtree probe per ref produces bothref_idxsandlit_blobs. 5 caller sites simplified.Pointer-passing (per Aiur's
inputSize→widthcost model):convert_univ(u: &Univ): per-row input width drops from 5-variant union to one G column. Callerconvert_univ_idxsalready produces the&Univvialist_lookup(univs, idx)— no extrastore.convert_one(input: &ConvertInput): callerconvert_allalready has&inputfrom theListNode.Consdestructure.Dead code deleted:
address_in_list,apply_ctor_overrides,lookup_override,build_lit_blobs,build_ref_idxs_mapped,is_blob,ctx_convert_expr.Shard 51 (blake3-stubbed): 102_190_540_489 → 93_939_967_063 — −8.07% on the non-blake3 portion on top of
120001d.Per-const wins on heavy production targets (unstubbed
lake test):Nat.add_commNat.sub_le_of_le_addNat.decLeArray.append_assocVector.appendIxVMInd.Even.recIxVMInd.Odd.recString.Internal.append_private...extractMainModule._unsafe_reclake test -- --ignored ixvmgreen; 42 pins re-pinned.3.
9c2eb23— IxVM: per-kind IOBuffer channels with documented interfaceThe pre-existing IOBuffer multiplexed claim bytes, assumption tree bytes, constant wire bytes, and a per-blob-addr empty-marker on a single channel 0, distinguished only by blake3 content-hash collision-impossibility. A new contributor reading
load_payload_const(key)couldn't tell from the helper whether they were getting a const, a claim, or a tree — only the caller knew the intent.This commit splits channel 0 into per-kind channels and lands a documented IxVM IOBuffer interface block (mirrored on both Aiur and Lean sides).
blake3(claim_bytes)tree.rootTier 1 fires once per
verify_claim. Tier 2 fires per constant traversed duringload_with_deps. Tier 3 fires per blob ref encountered duringbuild_ref_idxs_and_blobs.The blob discriminator (ch 4) replaces the previous
io_get_info(0, addr).len == 0 ⟹ blobhack insideload_with_depswith an explicit per-addr one-byte payload. No morelen-as-meaning overload.Every per-channel
io_get_info+#read_byte_streampair is inlined at the single consumer of each channel — no fn-call row gets added for the reorganization. Channel reorganization is cost-neutral by design, confirmed by measurement: per-const pin shifts vs9acbd58land within ±0.010% (noise-tier; the ch-4 discriminator's oneread_bytetrades against the eliminatedio_get_info(0, addr).lencheck).The win here is review-time clarity, not FFT —
io_get_info(channel, key)now has one value shape per channel and a doc block both kernel and harness reference.Soundness: ch 0/1/2/5 byte streams are blake3-verified by the kernel against their content-addressed keys. ch 3 hint is semantically optional (controls WHNF reduction heuristic only; def-eq is sound either way). ch 4 discriminator is sound by erasure-correctness — a lying byte flips the const/blob decision and the wrong-path load downstream fails (a "const" blob triggers a ch 2 read that returns empty → blake3 verify against the non-empty addr fails; a "blob" const dangles references → typecheck fail).
Cumulative measurement (vs
main)Nat.add_commNat.decLeNat.sub_le_of_le_addArray.append_assocVector.appendString.Internal.append_private...extractMainModule._unsafe_recTest plan
lake buildcleanlake test -- --ignored ixvmgreen at every commit