IxVM: native execution pipeline#463
Draft
arthurpaulino wants to merge 11 commits into
Draft
Conversation
Replace `klimbs_mul(base, klimbs_pow(base, klimbs_dec(exp)))`'s O(exp) per-step memo growth with a binary-exponentiation recursion of depth O(log exp). Both `exp / 2` and `exp % 2` route through the native `unconstrained_big_uint_div_mod` op, so each level pays O(1) for the division — `log2(exp)` memo entries for `klimbs_pow` itself + the `klimbs_mul(base, base)` doublings. `lake test -- --ignored ixvm` on `IxVMPrim.nat_pow_big` (`(2 ^ 16384 : Nat) - 2 ^ 16384 = 0`, measured against the current `ap/utf8-mod-witness` baseline): 3_941_276_885 → 71_856_587 FFT (-98.2%). Tests/Ix/IxVM.lean: * new `IxVMPrim.nat_pow_big` synthetic — exercises `klimbs_pow` at a non-trivial exponent (`2 ^ 16384`) so future regressions in the body shift its FFT pin. * small pin shifts in `nat_shl_lit`, `nat_shr_lit`, `nat_div_lit`, `nat_mod_lit`, `nat_gcd_lit`, `bv_to_nat_lit` — all reduce via `klimbs_pow` / `klimbs_div_mod` and absorb the new cost shape.
`lookup_addr_pos`'s rbtree (`addr_pos_map`) only contained const addresses. Every blob ref (literal-blob payload pointers in `Constant.refs`) probed the map, missed, and fell through to the O(N) linear scan over `all_addrs` which also returned 0. Wasted work proportional to (blob refs × shard closure size). Add a third value class to the same rbtree: the sentinel `4294967295` (beyond any honest `pos+1`), inserted by walking each const's `refs` once via `augment_with_blob_refs`. Any ref not already mapped to a const-position gets the sentinel. `lookup_addr_pos` becomes a 3-way match: `0` → fall back to linear scan (now only fires under the de-intern soundness corner-case, ~never in practice); `SENTINEL` → return 0 directly (known blob); `pos+1` → return `pos`. `is_blob` mirrors the same match. Soundness preserved: every probe still relies on the positive direction `ptr_val` equality ⇒ content equality (Aiur Store's content-addressing invariant). The linear fallback stays put for the (still theoretical) malicious de-intern, where it uses content-based `address_eq`. `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 51` (`ulimit -v 23000000`, with the kernel's blake3 paths in `load_verified_constant` / `load_verified_blob` / `load_verified_claim` stubbed for profiling): 119_152_044_062 → 102_190_540_489 FFT (-14.2%). Tests/Ix/IxVM.lean: 42 pin shifts absorbing the per-ingress augment-walk overhead.
A consolidated refactor of the ingress pipeline guided by a 5-reviewer synthesis of accidental complexity sites. Three families of change: PLUMBING * `Blake3.lean`: `verify_bytes_against(bytes, expected)` + `bytes_to_addr(bytes)` + `blake3_flat(input)` centralise the 24-line `[h[i][j]]` digest reshape that was hand-unrolled at 7 sites. ~100 lines of boilerplate deleted. * `Ingress.lean`: `load_payload_const`/`blob`/`hint` + `ch_const`/`blob`/`hint` factor the `io_get_info` + `#read_byte_stream` boilerplate at 6 sites. `load_verified_blob` no longer double-`load(addr)`s. * `sentinel_blob_ref()` named const replaces one bare `4294967295`. SIDECAR LISTS → RBTREE * `lookup_canon_addr`: O(N) parallel-list walk → O(log N) rbtree probe via new `build_canon_addr_map`. `canon_addrs: List<Addr>` arg replaced by `canon_addr_map: &RBTreeMap<Addr>` (one threaded ptr). * `lookup_block_start`: O(N) parallel-list walk → O(log N) rbtree probe via new `build_block_start_map`. `block_addrs` / `block_starts` args replaced by `block_start_map: &RBTreeMap<G>` across ~12 fn signatures. * `build_aux_recr_ctor_idxs` returns `(idxs, block_addr)` so the standalone-Recr caller in `build_convert_inputs_walk` doesn't redo the `rec_typ_to_inductive_addr` + `load_verified_constant` chain to recover `block_addr` — `build_aux_recr_ctor_idxs` already computed it internally. * `build_ref_idxs_and_blobs` fuses the two separate walks (`build_ref_idxs_mapped` + `build_lit_blobs`) into one — single rbtree probe per ref produces both `ref_idxs` and `lit_blobs`. 5 caller sites simplified. PTR-PASSING (per `reference_aiur_pass_pointer_not_value`) * `convert_univ(u: &Univ)`: per-row input width drops from 5-variant union to one G column. Caller `convert_univ_idxs` already produces the `&Univ` via `list_lookup(univs, idx)` — no extra `store`. * `convert_one(input: &ConvertInput)`: caller `convert_all` already has `&input` from the `ListNode.Cons` destructure. DEAD CODE DELETED `address_in_list`, `apply_ctor_overrides`, `lookup_override`, `build_lit_blobs`, `build_ref_idxs_mapped`, `is_blob`, `ctx_convert_expr` — all unused after the migrations above. TRIED + REVERTED (left as notes in code where instructive): * Fused `build_addr_pos_map` + `augment_with_blob_refs` single-pass. Lost to Aiur's per-row width tax — two narrower passes cheaper than one merged wide pass. * `convert_expr(ctx: &ConvertCtx)` ptr. Per-arm `load(ctx)` cost > raw-arg threading. * `compute_layout_walk` seen-mptrs/seen-poses → rbtree. Small regression on typical shards (few enough Muts blocks that O(N) list scan is already cheap; rbtree value column tax > savings). * `block_members_map` to skip per-projection `load_verified_constant`. Lost because Aiur memoises `load_verified_constant` automatically — the new rbtree probe's width tax exceeded the savings from the eliminated match cascade. * `ctx` everywhere as `&ConvertCtx` (with `store(...)` at 5 construction sites). The added `store` rows offset the per-row savings — wash. `lake test -- --ignored ixvm`: 42 pin shifts, all measurable wins vs the `ap/blob-ref-rbtree-augment` tip. Cumulative on heavy consts: * `Nat.add_comm`: 54_369_745 → 54_049_773 FFT (-0.59%) * `Nat.sub_le_of_le_add`: 515_331_420 → 510_843_459 FFT (-0.87%) * `Array.append_assoc`: 2_567_087_893 → 2_537_360_311 FFT (-1.16%) * `Vector.append`: 2_639_286_078 → 2_607_800_745 FFT (-1.19%) * `String.Internal.append`: 718_803_075 → 708_296_270 FFT (-1.46%) * `IxVMInd.Even.rec`: 31_659_493 → 31_434_525 FFT (-0.71%) * `_private....extractMainModule._unsafe_rec`: 1_081_617_705 → 1_064_689_765 FFT (-1.57%)
Replaces channel 0's tagged-union (claim / tree / const bytes / empty- blob-marker, distinguished only by key content-hash collision- impossibility) with one channel per value kind. Channel reorganization is cost-neutral — all six per-channel `io_get_info` + `#read_byte_stream` pairs are inlined at the single consumer of each channel, so no fn-call row gets added. New layout, tiered by access pattern: | Tier | Channel | Purpose | Key | Value | |-------|---------|--------------------------|-------------------------|----------------------| | Ctrl | 0 | claim wire bytes | `blake3(claim_bytes)` | claim bytes | | Ctrl | 1 | assumption tree bytes | `tree.root` | tree bytes | | Const | 2 | constant wire bytes | const addr | const bytes | | Const | 3 | Defn reducibility hint | Defn addr | single G | | Blob | 4 | blob discriminator | addr | one byte (1=const, 0=blob) | | Blob | 5 | blob raw bytes | blob addr | raw bytes | Tier 1 = one-per-`verify_claim`. Tier 2 = per-const. Tier 3 = per-blob. `io_get_info(channel, key)` is now unambiguous by channel alone. Aiur side (`Ingress.lean`, `Kernel/Claim.lean`): * `load_verified_constant` inlines `io_get_info(2, raw)` + `#read_byte_stream(2, ...)` (was ch 0). * `load_verified_blob` inlines ch 5 read (was ch 1). * `load_constant_hint` inlines ch 3 read (was ch 2). * `load_with_deps` inlines a ch 4 discriminator probe replacing the former `io_get_info(0, addr).len == 0 ⟹ blob` hack on ch 0. One read_byte for the per-addr discriminator byte. * `load_verified_claim` inlines ch 0 (was using a shared `load_payload_const` helper on ch 0). * `load_assumption_tree` inlines ch 1 (was using same `load_payload_const` helper on ch 0). * Helper wrappers (`load_payload_claim/tree/const/hint/blob`, `load_discriminator`, `ch_*` constants) all removed — every channel has exactly one consumer site and the inlined dispatch costs no fn call. * Top-of-file `IxVM IOBuffer interface` doc block documents the layout + soundness model. Lean side (`ClaimHarness.lean`): * `addEntries` writes per channel — consts → ch 2 + per-addr discriminator `[1]` on ch 4; blobs → ch 5 + per-addr discriminator `[0]` on ch 4; Defn hints → ch 3. * `seedTreeAt` writes tree bytes to ch 1 (was ch 0). * `buildClaimWitness` writes claim bytes to ch 0 (unchanged). * Drops the empty-marker write on ch 0 for blob addrs — the ch 4 discriminator covers blob/const classification explicitly. * Top-of-section `IxVM IOBuffer interface` doc block mirrors the Aiur-side comment. Soundness model unchanged. ch 0/1/2/5 byte streams are blake3-verified by the kernel against their content-addressed keys. ch 3 hint is semantically optional (controls WHNF reduction heuristic only; def-eq is sound either way). ch 4 discriminator is sound by erasure- correctness — a lying byte flips the const/blob decision and the wrong-path load downstream fails (a "const" blob triggers a ch 2 read that returns empty → blake3 verify against the non-empty addr fails; a "blob" const dangles references → typecheck fail). `lake test -- --ignored ixvm` green; 42 pins shifted within ±0.010% (noise-tier) vs the pre-shuffle baseline — channel reorganization is cost-neutral as required, the discriminator's one read_byte trades against the eliminated `io_get_info(0, ...).len` check.
Adds a new Aiur pipeline stage that translates `Bytecode.Toplevel`
into a Rust source module, one `fn aiur_fn_N` per Aiur function.
The generated code mirrors `src/aiur/execute.rs`'s QueryRecord
side effects exactly: same `function_queries.insert` timing, same
cache-hit multiplicity bumps, same memory-queries insertion order,
same `bytes{1,2}_queries` updates, same IO sequencing. Per-witness
trace hashes match the interpreter byte-for-byte on every const
tested (`Nat.add_comm`, `Vector.append`,
`Std.Time.Week.Offset.ofMilliseconds`).
* `Ix/Aiur/Stages/Codegen.lean`: structured Rust IR (`RustExpr` /
`RustStmt` / `RustItem` / `MatchArm` / labeled blocks) plus a
formatter. Codegen is a structural walk over the Bytecode AST,
no string templating. `EmitM = StateM EmitState` threads
`(nextVal, nextLabel)` so per-ValIdx Rust locals and per-MC
labels are fresh.
* Aiur's `map: Vec<G>` is gone in the generated kernel: every
ValIdx becomes a Rust local `__v_{i}: G`. Match-arm bodies snapshot
`nextVal` on entry so per-arm allocations don't leak to siblings.
`MatchContinue` / `Yield` use labeled-block + `break 'label
[G; OUT_SIZE]` to bubble yielded values out and rebind them at
outer scope.
* Deep Aiur recursion uses `stacker::maybe_grow(64 KiB, 4 MiB, …)`
at the top of every generated fn, so the native call stack grows
on demand rather than pre-reserving a giant thread stack. Verified
against shard 24 of the 64-way `init.ixes` partition, which
SEGFAULTs without it.
* `ix codegen`: new CLI command. Compiles the IxVM Aiur source,
walks the bytecode, writes the generated Rust to a fixed path
(`src/ix/aiur_ixvm.rs`). The output path is hard-coded; no
override.
* `src/ix/aiur_ixvm.rs`: the generated kernel, 743 `aiur_fn_*`
+ `execute_generated` dispatch. Auto-regenerated; do not edit.
`#![cfg_attr(rustfmt, rustfmt::skip)]` + a broad
`#![allow(unused_*, non_snake_case, clippy::all)]` since the
layout is for the compiler, not humans.
* `src/ix/aiur_ixvm_runner.rs`: `execute_ixvm(toplevel, fun_idx,
args, io_buffer) -> Result<(QueryRecord, Vec<G>), ExecError>`.
Same return shape as `Toplevel::execute`, but routes through
`execute_generated`.
* `src/aiur/synthesis.rs`: `AiurSystem::prove_ixvm(...)` —
same shape as `prove`, but the execute step calls
`execute_ixvm`. Verification-compatible (proofs from one path
verify under the other).
* `src/aiur/execute.rs`: helpers exposed for the codegen'd kernel
(`bytes{1,2}_execute` → `pub(crate)`,
`unconstrained_big_uint_div_mod_helper` extracted,
`CodegenBytes{1,2}{Op,}` aliases re-exported,
`QueryRecord::new` → `pub(crate)`, `ExecError::InvalidFunIdx`
added).
* `src/ffi/aiur/protocol.rs`: two new FFI exports —
`rs_aiur_toplevel_execute_ixvm` and `rs_aiur_system_prove_ixvm`.
Same wire format as the existing `_execute` / `prove` exports.
* `Ix/Aiur/Semantics/BytecodeFfi.lean`:
`Bytecode.Toplevel.executeIxVM` — mirror of `execute`.
* `Ix/Aiur/Protocol.lean`: `AiurSystem.proveIxVM`.
* `Ix/Cli/CheckCmd.lean`: `runCompiled` now dispatches through
`executeIxVM`. The Rust bytecode interpreter is no longer
reachable from `ix check` (the Lean-side `--interp` fallback
stays for richer error diagnostics).
* `Ix/Cli/ProveCmd.lean`: `proveOne` uses `proveIxVM`.
* `Cargo.toml`: `stacker = "0.1"`.
For witnesses where execute is the dominant work (single-const
checks via `ix check 'X'`), the codegen'd kernel is ~1.6× faster
than the bytecode interpreter end-to-end (measured on
`Nat.add_comm`, `Vector.append`,
`Std.Time.Week.Offset.ofMilliseconds`). Tiny consts are within
~10% (stacker probe overhead amortises). Peak RSS is within a few
MB of the interpreter — stacker grows the stack on demand, no
giant upfront reservation.
For shard-driven `ix check --shard K` runs the speedup is
invisible: per-shard wall time is dominated (~92%) by
`buildShardCheckEnvWitness` in Lean, NOT by the kernel execute
(~8%). Cutting witness construction is the next lever.
`Nat.add_comm` proof produced via `proveIxVM` verifies under the
existing `AiurSystem::verify`. End-to-end:
lake exe ix prove 'Nat.add_comm' → proof addr 0d0ab0f9…
lake exe ix verify 0d0ab0f9… → ok
Replaces `IxVM.ClaimHarness.buildShardCheckEnvWitness` (Lean side, ~92% of shard wall time on heavy partitions) with a Rust port that builds the `aiur::execute::IOBuffer` directly, without per-byte boxing into Lean `Aiur.G` values. The two hot phases run on rayon: * Closure walk: each owned addr's transitive `Constant.refs` + projection-block traversal runs on its own thread; results are deduped through a `dashmap::DashSet`. * Byte-to-G conversion: per-const `(key, data)` tuples are built in parallel chunks of 256; final IOBuffer assembly (channel arena append + key→`IOKeyInfo` map insert) runs serially because the arena index is monotonic. * `src/ix/aiur_ixvm_witness.rs` (new): `build_shard_check_env_witness` produces `(claim, claim_digest_input, io_buffer)` ready to feed to `execute_ixvm`. Mirrors the 6-channel layout documented in `Ix/IxVM/ClaimHarness.lean` (claim / asm tree / const bytes / Defn hint / blob discriminator / blob raw bytes). * `src/ffi/aiur/protocol.rs`: new FFI `rs_aiur_toplevel_shard_check_ixvm` bundles witness build + `execute_ixvm` into one cross-language trip. Returns the same `(output, ioBuffer, queryCounts)` shape as `rs_aiur_toplevel_execute_ixvm`, so the Lean shim stays drop-in compatible. * `Ix/Aiur/Semantics/BytecodeFfi.lean`: `Bytecode.Toplevel.shardCheckIxVM` wraps the FFI. * `Ix/Cli/CheckCmd.lean`: shard-mode `ix check` (`--ixe + --ixes`) now dispatches through `runShardOwnedNative` → `shardCheckIxVM`, bypassing the Lean witness builder. Single- shard and whole-partition paths share the fast route; the legacy `runShardCheckManifest` / `runShardCheckAll` callbacks stay live for `--interp` only. Shard 26 of the 64-way `init.ixes` partition, on the same host: | Variant | total | speedup | | --- | --: | --: | | Lean witness + bytecode-interp (baseline)| 1029 s | 1.0× | | Lean witness + codegen kernel | 1015 s | 1.01× | | Rust witness (serial) + codegen | 101 s | 9.95× | | Rust witness (parallel) + codegen | 80 s | 12.9× | The serial Rust witness collapses the 935 s of Lean per-byte boxing into ~23 s; rayon hides that 23 s under the codegen-kernel execute (~78 s) so witness build effectively becomes free at the shard level. FFT cost matches the bytecode interpreter exactly (107_006_963_281), so the QueryRecord layout is bit-identical across all four variants. * Closure walk uses single-source BFS with the global visited set shared via `DashSet::contains` checks before pushing to the per-thread stack — avoids redundant work without a per-owned union step. * Per-channel ordering: each chunk's partial `ChannelEntries` feeds the serial fold in iteration order, so channel arena contents match the Lean side's exactly. Test still relies on trace-hash parity from the earlier codegen commit, which exercised the same path through the bytecode interpreter and the codegen kernel. * Coverage check is currently skipped on the IxVM-native whole-partition path (`runShardManifestAllNative`); legacy `runShardCheckAll` still does it for `--interp`. Re-enable separately if needed.
plumbing, unchecked cache-hit array copy
Three targeted cuts to the generated kernel (`src/ix/aiur_ixvm.rs`);
all preserve QueryRecord parity (shard 26 FFT cost unchanged at
107_006_963_281).
Every `U8*` op previously emitted a `Vec<G>` scratch (~245 sites):
let __b2_out: [G; 1] = {
let mut __scratch: Vec<G> = vec![__v_i, __v_j];
if unconstrained { __scratch.extend(vec![Bytes2::xor(...)]); }
else { bytes2_execute(0, 1, &Bytes2Op::Xor, &mut __scratch, record); }
let __arr: [G; 1] = __scratch[2..].try_into().unwrap();
__arr
};
Net cost: a 2-element `vec![]` alloc + a `__scratch.extend(vec![..])`
(another small alloc inside `Bytes2::execute`'s returned Vec) + a
slice→array `try_into().unwrap()`.
Added per-op `bytes{1,2}_*_value(args..., record) -> G | (G,G) | (G,G,G) |
[G; 8]` in `src/aiur/execute.rs` — each bumps the corresponding
`bytes{1,2}_queries.bump_*` and returns the gadget output by value.
The pure (`Bytes{1,2}::*`) helpers stay the unconstrained shortcut.
Codegen now emits:
let __v_n: G = if unconstrained { Bytes2::xor(&__v_i, &__v_j) }
else { bytes2_xor_value(__v_i, __v_j, record) };
Zero `Vec<G>` allocation per byte op. `U8Add`/`U8Sub` also gain
because `bytes2_add_value` runs the `Bytes2::add` gadget once
(returning the full `(low, carry)`) instead of the interpreter's
two `Bytes2::add` calls (one for the carry, one for the low push).
To make `bump_*` callable from `execute.rs`, all `Bytes1Queries` /
`Bytes2Queries::bump_*` methods are now `pub(crate)`. No behaviour
change.
The `Op::Call` cache-check emitted `unconstrained || OP_UN` and
`!unconstrained && !OP_UN` even when the static `OP_UN` was `false`,
which is the common case. Codegen now emits the folded shape
directly:
let __cu = unconstrained; // was: unconstrained || false
if !unconstrained { *result.multiplicity += G::ONE; }
// was: !unconstrained && !false
When `OP_UN == true`, the callee always runs unconstrained AND the
multiplicity bump is always suppressed; codegen folds to `let __cu =
true;` and elides the bump branch entirely.
LLVM was already folding these, so this is mostly source-cleanliness
and a small frontend win.
The cache-hit branch read `result.output` (an `&[G]`) and converted
it back to `[G; OUT_N]` via `.try_into().unwrap()`. The length is
statically `OUT_N` — we control the producer (the matching
`aiur_fn_{callee}::Ctrl::Return` inserts an `[G; OUT_N]`-typed array
into the same slot). The runtime bounds-check + Result discharge is
dead work.
Codegen now emits:
let __ret: [G; OUT_N] = unsafe {
*(result.output.as_ptr() as *const [G; OUT_N])
};
Sound: same-fn slot, fixed `OUT_N`, no aliasing.
| Variant | mean |
| --- | --: |
| Rust witness (parallel) + codegen | 80 s |
| + value-helper byte ops (#1) | 64 s |
| + folded `__cu` + unsafe cache copy (#2+#3) | 62 s |
The bulk of the win is #1 (#1 alone: ~−18%). #2+#3 are a further
~3% — mostly noise / Rust-frontend cleanup since LLVM already folds
the constant `false` ops.
* `src/aiur/execute.rs`: 12 value-returning byte helpers added
(`bytes1_bit_decompose_value`, `bytes1_shift_left_value`,
`bytes1_shift_right_value`, `bytes2_{xor,and,or,less_than,mul,
chain_rotr7,chain_rotr4,add,sub}_value`).
* `src/aiur/gadgets/bytes1.rs`, `src/aiur/gadgets/bytes2.rs`: all
`bump_*` upgraded to `pub(crate)` so the value helpers can call
them.
* `Ix/Aiur/Stages/Codegen.lean`: `emitU8Bytes1` / `emitU8Bytes2` /
`emitU8Add` / `emitU8Sub` rewritten to call the per-op value
helpers — no scratch Vec, no slice→array conversion. `emitCall`
constant-folds `opUn = false` and emits the unsafe cache-hit
copy. Prelude imports updated to bring the new helpers into
scope.
* `src/ix/aiur_ixvm.rs`: regenerated. 245 `Vec<G>` scratch sites
→ 11 (only `unconstrainedBigUintDivMod`'s scratch left); 3324
`unconstrained || false` → 0; 3000+ `result.output.try_into()
.unwrap()` → 0.
* `Ix/Cli/CheckCmd.lean`: incidental cleanup of probe-only timing
prints in `runShardOwnedNative` (added during measurement, no
longer needed).
Per-claim mode (a `Claim.check addr none`) builds a closure rooted at the target address. That closure can be the whole environment when the target is a heavily-shared root constant. The Lean witness builder paid per-byte boxing into `Aiur.G` for every byte in that closure — the same cost we already eliminated for shard mode. # Surface Five new FFIs, all bundling witness build + execute_ixvm (and STARK prove where applicable) into one cross-language trip so the `IOBuffer` never crosses the boundary mid-pipeline: * `rs_aiur_toplevel_check_addr_ixvm` — `Bytecode.Toplevel.checkAddrIxVM (toplevel) (funIdx) (ixePath) (addrBytes)`: per-claim check; builds witness for `Claim.check addr none` via `build_claim_check_witness`, runs `execute_ixvm`. Same return shape as `shardCheckIxVM`. * `rs_aiur_toplevel_check_env_bytes_ixvm` — same as above but takes the env as a serialized byte blob (`Ixon.serEnv`) instead of a `.ixe` path. Used by the compiled-Lean-env code path (`ix check NAME` without `--ixe`), where the env is built in Lean memory. * `rs_aiur_system_prove_addr_ixvm` — `AiurSystem.proveAddrIxVM (...)`: per-claim prove (witness + execute + STARK prove all in Rust). * `rs_aiur_system_prove_env_bytes_ixvm` — bytes-blob counterpart. * `rs_aiur_system_shard_prove_ixvm` — `AiurSystem.shardProveIxVM (...)`: per-shard prove end-to-end in Rust. `build_claim_check_witness` lives next to `build_shard_check_env_witness` in `src/ix/aiur_ixvm_witness.rs` and uses the same parallel closure walk + parallel byte→G conversion. The bytes-blob FFIs additionally harvest `anon_hints` from each `Def` named entry after `Env::get` — `Env::get` (full form, used by the bytes-blob path) doesn't populate `env.anon_hints` the way `Env::get_anon` does, but the kernel's `verify_claim` reads ch 3 (Defn reducibility hints) so the hints must end up in the env. This mirrors the harvest pass already inside `get_anon` at `src/ix/ixon/serialize.rs:1683`. # Wiring `Ix.Cli.CheckCmd.WitnessSource`: a three-arm discriminated union threaded through `forEachClaim` (the shared check/prove driver): * `.native ixePath addr` — `.ixe`-backed env, Rust mmap. * `.nativeBytes envBytes addr` — Lean-memory env serialized to bytes, Rust decodes via `Env::get`. * `.lean witness` — pre-built `ClaimWitness` (fallback). `runCompiled` and `proveOne` dispatch on the source. # Coverage | Command | Path | |---------------------------------------|-----------------------------------------------| | `ix check NAME --ixe` | **`checkAddrIxVM`** (new) | | `ix check NAME` (no `--ixe`) | **`checkEnvBytesIxVM`** (new) | | `ix check --claim hex --ixe` | `checkAddrIxVM` for `check addr none` | | `ix check --ixes --shard K` | `shardCheckIxVM` (existed) | | `ix check --ixes` | `shardCheckIxVM` (existed) | | `ix prove NAME --ixe` | **`proveAddrIxVM`** (new) | | `ix prove NAME` (no `--ixe`) | **`proveEnvBytesIxVM`** (new) | | `ix prove --ixes --shard K` | **`shardProveIxVM`** (new) | | `ix prove --ixes` | **`shardProveIxVM`** loop (new) | | `Benchmarks/Typecheck.lean` Phase 1 | `checkAddrIxVM` / `executeIxVM` | | `Benchmarks/Typecheck.lean` Phase 2 | `proveAddrIxVM` / `proveIxVM` | | `Benchmarks/IxVM.lean` | `proveIxVM` (was `prove`) | `--interp` route is preserved: it materialises any `WitnessSource` back into a `ClaimWitness` before driving the Aiur source interpreter. `--claim <hex>` over non-`check addr none` variants (`eval` / `reveal` / `contains` / `checkEnv-with-asm`) still uses the Lean witness builder. # Sanity * `ix check Nat.add_comm --ixe init.ixe` (warm): 3.1 s wall, FFT = 23_603_449. * `ix check Nat.add_comm` (compiled env via bytes blob, warm): 2.4 s wall, FFT = 23_603_449. # Known overheads (tracked for a follow-up redesign) * **Per-claim env re-parse on `--ixe + many names`**: each FFI call re-runs `Env::get_anon_mmap` on the same path. Mathlib-scale iteration pays the lazy-index build N times instead of once. * **`--interp + .nativeBytes` round-trip**: Lean serializes the compiled env then `materialise` deserializes it back, just to feed `runInterp`. * **`runShardProveNative` claim reconstruct**: redoes the closure walk + canonical `AssumptionTree` build Lean-side after `shardProveIxVM` already computed the same claim internally. A future `EnvHandle`-based API would close all three: build the env once into a Rust-owned handle, pass the handle to every per-claim/per-shard FFI, and have prove FFIs return the serialized claim bytes.
Before: every per-claim and per-shard FFI re-parsed the Ixon env (`Env::get_anon_mmap` per call). On `--ixe + many names` / all-shards prove, this is the dominant overhead for iteration-heavy workflows — O(num_consts) lazy-index build × N targets. After: the env lives once per CLI invocation in a Rust-owned `EnvHandle`. Lean holds an opaque `Aiur.EnvHandle` reference and threads `@& EnvHandle` through every per-target FFI call. The env is parsed exactly once at handle construction; downstream calls share it. # Surface Five new FFIs collapse the previous six per-call variants (`checkAddrIxVM`, `checkEnvBytesIxVM`, `shardCheckIxVM`, `proveAddrIxVM`, `proveEnvBytesIxVM`, `shardProveIxVM` — all deleted): * `rs_aiur_env_handle_from_ixe(path) → LeanExternal<EnvHandle>`: mmap-load via `Env::get_anon_mmap`. Anon parser already harvests `anon_hints`; no post-pass. * `rs_aiur_env_handle_from_bytes(blob) → LeanExternal<EnvHandle>`: decode `Ixon.serEnv`-shape blob via `Env::get` + harvest `anon_hints` from each `Def` named entry. Used by the compiled-Lean-env path. * `rs_aiur_toplevel_check_addr_with_env(toplevel, fun_idx, handle, addr_bytes)`: per-claim check. Reuses handle's parsed env. * `rs_aiur_toplevel_shard_check_with_env(toplevel, fun_idx, handle, owned_blob)`: per-shard check. * `rs_aiur_system_prove_addr_with_env(system, fri, fun_idx, handle, addr_bytes) → (claim_bytes, proof, ioBuffer)`: per-claim prove. Rust serializes the reconstructed `Ix.Claim` via `ixon::Claim::put` so Lean can deserialize via `Ixon.runGet Ix.Claim.get` directly — no closure walk + canonical `AssumptionTree` recomputation Lean-side. * `rs_aiur_system_shard_prove_with_env(system, fri, fun_idx, handle, owned_blob) → (claim_bytes, proof, ioBuffer)`: per-shard prove. # Lean * `Aiur.EnvHandle` opaque type with `fromIxe` / `fromBytes`. * `Aiur.Bytecode.Toplevel.checkAddrWithEnv` / `shardCheckWithEnv`, `Aiur.AiurSystem.proveAddrWithEnv` / `shardProveWithEnv`. * `Ix.Cli.CheckCmd.Target` replaces `WitnessSource`: ```lean inductive Target where | addr (a : Address) -- Claim.check addr none | shard (owned : Array Address) -- Claim.checkEnv | leanW (w : ClaimWitness) -- --interp, --claim non-check ``` `runCompiled` / `proveOne` take `(envHandle?, target)`. The envHandle is `none` only for `.leanW` (`--interp` legacy path). * `runShardOwnedNative` and the manifest-driver helpers take the envHandle as a parameter. Single-shard mode builds it once; all-shards mode reuses the same handle across every shard's FFI call (eliminates per-shard re-mmap). * `runShardProveNative` deserializes the wire claim bytes via `Ixon.runGet Ix.Claim.get` instead of re-running `shardCheckEnvClaim`. # Benchmarks `Benchmarks/Typecheck.lean` builds the envHandle once before Phase 1, reuses it across Phase 1 (execute) + Phase 2 (prove). Both phases now go through `checkAddrWithEnv` / `proveAddrWithEnv` on the full-closure path. `--subject-only` still uses Lean `buildVerifyConst` + `executeIxVM` / `proveIxVM` (witnesses intentionally small there). # New crate `crates/ix/src/env_handle.rs` — the `EnvHandle` struct + constructors live in the existing `ix` crate next to the witness builder and codegen runner. # Sanity * Multi-target check (warm): `ix check --ixe init.ixe Nat.add_comm Nat.add` → 3.3 s wall, single envHandle shared across both targets. * Shard 26 check (warm): `ix check --ixe init.ixe --ixes init.ixes --shard 26` → ~78 s wall, FFT = 107_006_963_281 (parity preserved). # Coverage | Command | Path | |--------------------------------------|-----------------------------------| | `ix check NAME --ixe` | `checkAddrWithEnv` | | `ix check NAME` (no `--ixe`) | `checkAddrWithEnv` + fromBytes | | `ix check --claim hex --ixe` | `checkAddrWithEnv` for check-none | | `ix check --ixes --shard K` | `shardCheckWithEnv` | | `ix check --ixes` | `shardCheckWithEnv` × all shards | | `ix prove NAME --ixe` | `proveAddrWithEnv` | | `ix prove --ixes --shard K` | `shardProveWithEnv` | | `ix prove --ixes` | `shardProveWithEnv` × all shards | | `Benchmarks/Typecheck` Phase 1+2 | `checkAddrWithEnv` / `proveAddrWithEnv` | Only `--interp` and `--claim hex` over a non-`check addr none` persisted claim still build a Lean `ClaimWitness`; both go via the `.leanW` target arm.
`ix codegen --check` compares the emitted Rust source against the on-disk `crates/ix/src/aiur_ixvm.rs` and exits 0 if identical, 1 otherwise. No write side effect. ~2 s warm; fast enough to gate on every PR. Wired into the `lean-test` CI job so a forgotten regen on a kernel-touching PR fails CI instead of merging stale generated code that drifts from the Bytecode → Rust emitter.
Two interpreter modes behind a single `--interp` flag: * `--interp source`: Aiur source interpreter (`Aiur.runFunction` over the source-level `Decls`). Richer per-step error diagnostics. * `--interp bytecode`: generic Aiur bytecode interpreter (`Bytecode.Toplevel.execute`, `rs_aiur_toplevel_execute` route). Skips the `ix codegen` + `cargo build --release` cycle needed after editing `Ix/IxVM/*.lean` — the bytecode is rebuilt Lean-side at exe load. Slower per-check than the codegen kernel; ideal for tight iteration on the IxVM source. * omit the flag entirely for the native codegen kernel (default). Invalid values (e.g. `--interp foo`) fail fast with a clear error. # Plumbing The `check_addr_with_env` and `shard_check_with_env` FFIs take a `use_bytecode: bool` and dispatch via a shared `dispatch_execute` helper. `--interp bytecode` sets it to `true`. The `.leanW` arm of `runCompiled` also picks between `Bytecode.Toplevel.execute` and `executeIxVM` based on the same flag. The `--interp source` path requires a Lean `ClaimWitness`. The existing driver had switched to `.addr`/`.shard` targets against the Rust-owned `EnvHandle`; source-interp had no way to materialise those. `forEachClaim` now takes a `forceLeanWitness : Bool` — when true, it builds a Lean witness for every target (via `mkWitness` / `loadIxonEnv` for the compiled-Lean-env path) and passes `.leanW` so `runInterp` can consume it directly. # Sanity * `ix check Std.Time.Week.Offset.ofMilliseconds` (warm): 9.3 s wall, FFT = 12_430_516_949 (codegen kernel). * `ix check --interp bytecode Std.Time.Week.Offset.ofMilliseconds` (warm): 14.6 s wall, FFT = 12_430_516_949 (bytecode interp). * `ix check --interp source Eq` (warm): 6.3 s wall, output `Eq: ()`. * `ix check --interp garbage Nat.add_comm`: exits 1 with clear error.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Codegen the IxVM Aiur kernel to Rust, run witness/prove Rust-side against a Rust-owned env, and wire the fast path through every
ix check/ix proveentry point.Wall time on shard 26 of the 64-way
init.ixespartition: 1029 s → 62 s (~16× end-to-end).What's here
Eight commits, roughly grouped:
Codegen kernel
Bytecode.Topleveland emits onefn aiur_fn_N(...)per Aiur function intocrates/ix/src/aiur_ixvm.rs(~4.8 MB, 743 fns). SameQueryRecordshape as the interpreter — proof-verification compatible.Vec<G>scratch buffer per byte op (~245 sites) via per-op helpers (bytes2_xor_value,bytes2_add_value, etc.) that bump the byte-chip queries and return the gadget output directly. Also dead-foldsunconstrained || false/!unconstrained && !falseand uses an unchecked array copy on Call cache hits.Rust-owned witness + env
IxVM.ClaimHarness.buildShardCheckEnvWitnessto Rust. Parallel closure walk (rayon +DashSet) + parallel byte→G conversion. Eliminates the per-byte boxing intoAiur.Gthat dominated per-shard wall time.check_addr/prove_addr/shard_prove, plus bytes-blob variants for the compiled-Lean-env code path.EnvHandle: Rust-ownedixon::Envexposed to Lean as an opaque handle. Collapses six per-call FFIs into four*_with_envFFIs + two constructors (fromIxe/fromBytes). The env is parsed exactly once per CLI invocation; every per-target FFI call reuses the same handle. Prove FFIs returnclaim_bytesso Lean deserializes the wire claim instead of recomputing it.CI + ergonomics
ix codegen --check. CI-facing flag that compares the emitted Rust source against the on-disk file and exits 1 on drift. Wired intolean-test. Warm ~2 s.ix check --interp {source|bytecode}. Single flag with two modes.bytecodebypasses the codegen kernel via the generic Aiur bytecode interpreter — noix codegen+cargo buildcycle when iterating onIx/IxVM/*.lean.sourcereaches the Aiur source interpreter for its richer per-step diagnostics.Coverage
Every
ix check/ix proveshape now routes through the native pipeline:ix check NAME --ixecheckAddrWithEnv(codegen kernel)ix check NAME(no--ixe)checkAddrWithEnv+EnvHandle.fromBytesix check --claim hex --ixecheckAddrWithEnvforcheck addr noneix check --ixes --shard KshardCheckWithEnvix check --ixesshardCheckWithEnv× all shards, shared envix prove NAME --ixeproveAddrWithEnvix prove --ixes --shard KshardProveWithEnvix prove --ixesshardProveWithEnv× all shardsBenchmarks/TypecheckPhase 1 + 2EnvHandleacross execute + proveFallbacks:
--interp source— Aiur source interpreter (richer errors).Target.leanWmaterialised Lean-side.--interp bytecode— generic bytecode interpreter, skips codegen/rebuild cycle.--claim <hex>for non-check addr nonevariants (eval/reveal/contains/checkEnv-with-asm) — Lean witness builder.FFI surface
Six new FFIs replace the ten single-use ones deleted along the way:
Measurements
Shard 26 of the 64-way
init.ixespartition (blake3 / defeq / Nat-heavy):EnvHandle(env parsed once)FFT cost
107_006_963_281on every variant — bit-identicalQueryRecordtraces across all backends.Sanity across three entry points (warm):
ix check Nat.add_comm(compiled env,EnvHandle.fromBytes)ix check --ixe init.ixe Nat.add_commix check Std.Time.Week.Offset.ofMilliseconds(codegen)ix check --interp bytecode Std.Time.Week.Offset.ofMillisecondsix check --interp source EqEq: ())New crate
crates/ix/(ix) — IxVM-specific glue. Depends onaiur,common,ixon.aiur_ixvm.rs— codegen'd kernel (regenerated byix codegen).aiur_ixvm_runner.rs— thinexecute_ixvmwrapper that routes throughexecute_generated.aiur_ixvm_witness.rs— Rust witness builders (build_claim_check_witness,build_shard_check_env_witness) with parallel closure walk + byte-to-G conversion.env_handle.rs—EnvHandlewrappingixon::Envwithanon_hintspopulated.CI
lean-testnow runslake exe ix codegen --check(exits 1 ifcrates/ix/src/aiur_ixvm.rsdoesn't match what the emitter would produce). Prevents merging stale generated kernels that drift from the Bytecode → Rust pass.Notes for reviewers
crates/ix/src/aiur_ixvm.rsis generated. Ignore the diff; regeneration is deterministic and CI-gated.--interp sourcepath still requires a Lean-sideClaimWitness.forEachClaimtakes aforceLeanWitness : Booland materialises one viamkWitnesswhen the source interpreter is selected.EnvHandlekeeps the mmap alive across per-target calls:Env::get_anon_mmapreturns per-constantArc<Mmap>windows, and the handle owns the Env; Lean's reference-countedLeanExternal<EnvHandle>drives the drop.ffiintoaiur::AiurSystem::prove_ixvmsoaiurdoesn't need to depend onix(which depends onaiur).