IxVM: native execution pipeline by arthurpaulino · Pull Request #463 · argumentcomputer/ix

arthurpaulino · 2026-07-01T11:19:05Z

Codegen the IxVM Aiur kernel to Rust, run witness/prove Rust-side against a Rust-owned env, and wire the fast path through every ix check / ix prove entry point.

Wall time on shard 26 of the 64-way init.ixes partition: 1029 s → 62 s (~16× end-to-end).

What's here

Eight commits, roughly grouped:

Codegen kernel

Aiur Bytecode → Rust codegen pass. Walks the compiled IxVM Bytecode.Toplevel and emits one fn aiur_fn_N(...) per Aiur function into crates/ix/src/aiur_ixvm.rs (~4.8 MB, 743 fns). Same QueryRecord shape as the interpreter — proof-verification compatible.
Value-returning byte ops in the codegen output. Eliminates the Vec<G> scratch buffer per byte op (~245 sites) via per-op helpers (bytes2_xor_value, bytes2_add_value, etc.) that bump the byte-chip queries and return the gadget output directly. Also dead-folds unconstrained || false / !unconstrained && !false and uses an unchecked array copy on Call cache hits.

Rust-owned witness + env

Port IxVM.ClaimHarness.buildShardCheckEnvWitness to Rust. Parallel closure walk (rayon + DashSet) + parallel byte→G conversion. Eliminates the per-byte boxing into Aiur.G that dominated per-shard wall time.
Wire the Rust witness fast path through per-claim check/prove and both benchmarks. Adds FFIs for check_addr / prove_addr / shard_prove, plus bytes-blob variants for the compiled-Lean-env code path.
EnvHandle: Rust-owned ixon::Env exposed to Lean as an opaque handle. Collapses six per-call FFIs into four *_with_env FFIs + two constructors (fromIxe / fromBytes). The env is parsed exactly once per CLI invocation; every per-target FFI call reuses the same handle. Prove FFIs return claim_bytes so Lean deserializes the wire claim instead of recomputing it.

CI + ergonomics

ix codegen --check. CI-facing flag that compares the emitted Rust source against the on-disk file and exits 1 on drift. Wired into lean-test. Warm ~2 s.
ix check --interp {source|bytecode}. Single flag with two modes. bytecode bypasses the codegen kernel via the generic Aiur bytecode interpreter — no ix codegen + cargo build cycle when iterating on Ix/IxVM/*.lean. source reaches the Aiur source interpreter for its richer per-step diagnostics.

Coverage

Every ix check / ix prove shape now routes through the native pipeline:

Command	Path
`ix check NAME --ixe`	`checkAddrWithEnv` (codegen kernel)
`ix check NAME` (no `--ixe`)	`checkAddrWithEnv` + `EnvHandle.fromBytes`
`ix check --claim hex --ixe`	`checkAddrWithEnv` for `check addr none`
`ix check --ixes --shard K`	`shardCheckWithEnv`
`ix check --ixes`	`shardCheckWithEnv` × all shards, shared env
`ix prove NAME --ixe`	`proveAddrWithEnv`
`ix prove --ixes --shard K`	`shardProveWithEnv`
`ix prove --ixes`	`shardProveWithEnv` × all shards
`Benchmarks/Typecheck` Phase 1 + 2	shared `EnvHandle` across execute + prove

Fallbacks:

--interp source — Aiur source interpreter (richer errors). Target.leanW materialised Lean-side.
--interp bytecode — generic bytecode interpreter, skips codegen/rebuild cycle.
--claim <hex> for non-check addr none variants (eval / reveal / contains / checkEnv-with-asm) — Lean witness builder.

FFI surface

Six new FFIs replace the ten single-use ones deleted along the way:

// Constructors
rs_aiur_env_handle_from_ixe(path)     -> LeanExternal<EnvHandle>
rs_aiur_env_handle_from_bytes(bytes)  -> LeanExternal<EnvHandle>

// Check (returns output + ioBuffer + queryCounts)
rs_aiur_toplevel_check_addr_with_env  (toplevel, fun_idx, handle, addr_bytes,  use_bytecode)
rs_aiur_toplevel_shard_check_with_env (toplevel, fun_idx, handle, owned_blob,  use_bytecode)

// Prove (returns claim_bytes + proof + ioBuffer)
rs_aiur_system_prove_addr_with_env    (system, fri, fun_idx, handle, addr_bytes)
rs_aiur_system_shard_prove_with_env   (system, fri, fun_idx, handle, owned_blob)

Measurements

Shard 26 of the 64-way init.ixes partition (blake3 / defeq / Nat-heavy):

Variant	Wall	Speedup
Baseline (Lean witness + bytecode interp)	1029 s	1.0×
Lean witness + codegen kernel	1015 s	1.01×
Rust witness (serial) + codegen kernel	101 s	10 ×
Rust witness (parallel) + codegen kernel	80 s	12.9×
+ value-helper byte ops	64 s	16.1×
+ `EnvHandle` (env parsed once)	62 s	16.6×

FFT cost 107_006_963_281 on every variant — bit-identical QueryRecord traces across all backends.

Sanity across three entry points (warm):

Command	Wall	FFT (G)
`ix check Nat.add_comm` (compiled env, `EnvHandle.fromBytes`)	2.4 s	23_603_449
`ix check --ixe init.ixe Nat.add_comm`	3.1 s	23_603_449
`ix check Std.Time.Week.Offset.ofMilliseconds` (codegen)	9.3 s	12_430_516_949
`ix check --interp bytecode Std.Time.Week.Offset.ofMilliseconds`	14.6 s	12_430_516_949
`ix check --interp source Eq`	6.3 s	(`Eq: ()`)

New crate

crates/ix/ (ix) — IxVM-specific glue. Depends on aiur, common, ixon.

aiur_ixvm.rs — codegen'd kernel (regenerated by ix codegen).
aiur_ixvm_runner.rs — thin execute_ixvm wrapper that routes through execute_generated.
aiur_ixvm_witness.rs — Rust witness builders (build_claim_check_witness, build_shard_check_env_witness) with parallel closure walk + byte-to-G conversion.
env_handle.rs — EnvHandle wrapping ixon::Env with anon_hints populated.

CI

lean-test now runs lake exe ix codegen --check (exits 1 if crates/ix/src/aiur_ixvm.rs doesn't match what the emitter would produce). Prevents merging stale generated kernels that drift from the Bytecode → Rust pass.

Notes for reviewers

crates/ix/src/aiur_ixvm.rs is generated. Ignore the diff; regeneration is deterministic and CI-gated.
The --interp source path still requires a Lean-side ClaimWitness. forEachClaim takes a forceLeanWitness : Bool and materialises one via mkWitness when the source interpreter is selected.
EnvHandle keeps the mmap alive across per-target calls: Env::get_anon_mmap returns per-constant Arc<Mmap> windows, and the handle owns the Env; Lean's reference-counted LeanExternal<EnvHandle> drives the drop.
The prove-path executor is injected as a closure from ffi into aiur::AiurSystem::prove_ixvm so aiur doesn't need to depend on ix (which depends on aiur).

Replace `klimbs_mul(base, klimbs_pow(base, klimbs_dec(exp)))`'s O(exp) per-step memo growth with a binary-exponentiation recursion of depth O(log exp). Both `exp / 2` and `exp % 2` route through the native `unconstrained_big_uint_div_mod` op, so each level pays O(1) for the division — `log2(exp)` memo entries for `klimbs_pow` itself + the `klimbs_mul(base, base)` doublings. `lake test -- --ignored ixvm` on `IxVMPrim.nat_pow_big` (`(2 ^ 16384 : Nat) - 2 ^ 16384 = 0`, measured against the current `ap/utf8-mod-witness` baseline): 3_941_276_885 → 71_856_587 FFT (-98.2%). Tests/Ix/IxVM.lean: * new `IxVMPrim.nat_pow_big` synthetic — exercises `klimbs_pow` at a non-trivial exponent (`2 ^ 16384`) so future regressions in the body shift its FFT pin. * small pin shifts in `nat_shl_lit`, `nat_shr_lit`, `nat_div_lit`, `nat_mod_lit`, `nat_gcd_lit`, `bv_to_nat_lit` — all reduce via `klimbs_pow` / `klimbs_div_mod` and absorb the new cost shape.

`lookup_addr_pos`'s rbtree (`addr_pos_map`) only contained const addresses. Every blob ref (literal-blob payload pointers in `Constant.refs`) probed the map, missed, and fell through to the O(N) linear scan over `all_addrs` which also returned 0. Wasted work proportional to (blob refs × shard closure size). Add a third value class to the same rbtree: the sentinel `4294967295` (beyond any honest `pos+1`), inserted by walking each const's `refs` once via `augment_with_blob_refs`. Any ref not already mapped to a const-position gets the sentinel. `lookup_addr_pos` becomes a 3-way match: `0` → fall back to linear scan (now only fires under the de-intern soundness corner-case, ~never in practice); `SENTINEL` → return 0 directly (known blob); `pos+1` → return `pos`. `is_blob` mirrors the same match. Soundness preserved: every probe still relies on the positive direction `ptr_val` equality ⇒ content equality (Aiur Store's content-addressing invariant). The linear fallback stays put for the (still theoretical) malicious de-intern, where it uses content-based `address_eq`. `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 51` (`ulimit -v 23000000`, with the kernel's blake3 paths in `load_verified_constant` / `load_verified_blob` / `load_verified_claim` stubbed for profiling): 119_152_044_062 → 102_190_540_489 FFT (-14.2%). Tests/Ix/IxVM.lean: 42 pin shifts absorbing the per-ingress augment-walk overhead.

A consolidated refactor of the ingress pipeline guided by a 5-reviewer synthesis of accidental complexity sites. Three families of change: PLUMBING * `Blake3.lean`: `verify_bytes_against(bytes, expected)` + `bytes_to_addr(bytes)` + `blake3_flat(input)` centralise the 24-line `[h[i][j]]` digest reshape that was hand-unrolled at 7 sites. ~100 lines of boilerplate deleted. * `Ingress.lean`: `load_payload_const`/`blob`/`hint` + `ch_const`/`blob`/`hint` factor the `io_get_info` + `#read_byte_stream` boilerplate at 6 sites. `load_verified_blob` no longer double-`load(addr)`s. * `sentinel_blob_ref()` named const replaces one bare `4294967295`. SIDECAR LISTS → RBTREE * `lookup_canon_addr`: O(N) parallel-list walk → O(log N) rbtree probe via new `build_canon_addr_map`. `canon_addrs: List<Addr>` arg replaced by `canon_addr_map: &RBTreeMap<Addr>` (one threaded ptr). * `lookup_block_start`: O(N) parallel-list walk → O(log N) rbtree probe via new `build_block_start_map`. `block_addrs` / `block_starts` args replaced by `block_start_map: &RBTreeMap<G>` across ~12 fn signatures. * `build_aux_recr_ctor_idxs` returns `(idxs, block_addr)` so the standalone-Recr caller in `build_convert_inputs_walk` doesn't redo the `rec_typ_to_inductive_addr` + `load_verified_constant` chain to recover `block_addr` — `build_aux_recr_ctor_idxs` already computed it internally. * `build_ref_idxs_and_blobs` fuses the two separate walks (`build_ref_idxs_mapped` + `build_lit_blobs`) into one — single rbtree probe per ref produces both `ref_idxs` and `lit_blobs`. 5 caller sites simplified. PTR-PASSING (per `reference_aiur_pass_pointer_not_value`) * `convert_univ(u: &Univ)`: per-row input width drops from 5-variant union to one G column. Caller `convert_univ_idxs` already produces the `&Univ` via `list_lookup(univs, idx)` — no extra `store`. * `convert_one(input: &ConvertInput)`: caller `convert_all` already has `&input` from the `ListNode.Cons` destructure. DEAD CODE DELETED `address_in_list`, `apply_ctor_overrides`, `lookup_override`, `build_lit_blobs`, `build_ref_idxs_mapped`, `is_blob`, `ctx_convert_expr` — all unused after the migrations above. TRIED + REVERTED (left as notes in code where instructive): * Fused `build_addr_pos_map` + `augment_with_blob_refs` single-pass. Lost to Aiur's per-row width tax — two narrower passes cheaper than one merged wide pass. * `convert_expr(ctx: &ConvertCtx)` ptr. Per-arm `load(ctx)` cost > raw-arg threading. * `compute_layout_walk` seen-mptrs/seen-poses → rbtree. Small regression on typical shards (few enough Muts blocks that O(N) list scan is already cheap; rbtree value column tax > savings). * `block_members_map` to skip per-projection `load_verified_constant`. Lost because Aiur memoises `load_verified_constant` automatically — the new rbtree probe's width tax exceeded the savings from the eliminated match cascade. * `ctx` everywhere as `&ConvertCtx` (with `store(...)` at 5 construction sites). The added `store` rows offset the per-row savings — wash. `lake test -- --ignored ixvm`: 42 pin shifts, all measurable wins vs the `ap/blob-ref-rbtree-augment` tip. Cumulative on heavy consts: * `Nat.add_comm`: 54_369_745 → 54_049_773 FFT (-0.59%) * `Nat.sub_le_of_le_add`: 515_331_420 → 510_843_459 FFT (-0.87%) * `Array.append_assoc`: 2_567_087_893 → 2_537_360_311 FFT (-1.16%) * `Vector.append`: 2_639_286_078 → 2_607_800_745 FFT (-1.19%) * `String.Internal.append`: 718_803_075 → 708_296_270 FFT (-1.46%) * `IxVMInd.Even.rec`: 31_659_493 → 31_434_525 FFT (-0.71%) * `_private....extractMainModule._unsafe_rec`: 1_081_617_705 → 1_064_689_765 FFT (-1.57%)

Replaces channel 0's tagged-union (claim / tree / const bytes / empty- blob-marker, distinguished only by key content-hash collision- impossibility) with one channel per value kind. Channel reorganization is cost-neutral — all six per-channel `io_get_info` + `#read_byte_stream` pairs are inlined at the single consumer of each channel, so no fn-call row gets added. New layout, tiered by access pattern: | Tier | Channel | Purpose | Key | Value | |-------|---------|--------------------------|-------------------------|----------------------| | Ctrl | 0 | claim wire bytes | `blake3(claim_bytes)` | claim bytes | | Ctrl | 1 | assumption tree bytes | `tree.root` | tree bytes | | Const | 2 | constant wire bytes | const addr | const bytes | | Const | 3 | Defn reducibility hint | Defn addr | single G | | Blob | 4 | blob discriminator | addr | one byte (1=const, 0=blob) | | Blob | 5 | blob raw bytes | blob addr | raw bytes | Tier 1 = one-per-`verify_claim`. Tier 2 = per-const. Tier 3 = per-blob. `io_get_info(channel, key)` is now unambiguous by channel alone. Aiur side (`Ingress.lean`, `Kernel/Claim.lean`): * `load_verified_constant` inlines `io_get_info(2, raw)` + `#read_byte_stream(2, ...)` (was ch 0). * `load_verified_blob` inlines ch 5 read (was ch 1). * `load_constant_hint` inlines ch 3 read (was ch 2). * `load_with_deps` inlines a ch 4 discriminator probe replacing the former `io_get_info(0, addr).len == 0 ⟹ blob` hack on ch 0. One read_byte for the per-addr discriminator byte. * `load_verified_claim` inlines ch 0 (was using a shared `load_payload_const` helper on ch 0). * `load_assumption_tree` inlines ch 1 (was using same `load_payload_const` helper on ch 0). * Helper wrappers (`load_payload_claim/tree/const/hint/blob`, `load_discriminator`, `ch_*` constants) all removed — every channel has exactly one consumer site and the inlined dispatch costs no fn call. * Top-of-file `IxVM IOBuffer interface` doc block documents the layout + soundness model. Lean side (`ClaimHarness.lean`): * `addEntries` writes per channel — consts → ch 2 + per-addr discriminator `[1]` on ch 4; blobs → ch 5 + per-addr discriminator `[0]` on ch 4; Defn hints → ch 3. * `seedTreeAt` writes tree bytes to ch 1 (was ch 0). * `buildClaimWitness` writes claim bytes to ch 0 (unchanged). * Drops the empty-marker write on ch 0 for blob addrs — the ch 4 discriminator covers blob/const classification explicitly. * Top-of-section `IxVM IOBuffer interface` doc block mirrors the Aiur-side comment. Soundness model unchanged. ch 0/1/2/5 byte streams are blake3-verified by the kernel against their content-addressed keys. ch 3 hint is semantically optional (controls WHNF reduction heuristic only; def-eq is sound either way). ch 4 discriminator is sound by erasure- correctness — a lying byte flips the const/blob decision and the wrong-path load downstream fails (a "const" blob triggers a ch 2 read that returns empty → blake3 verify against the non-empty addr fails; a "blob" const dangles references → typecheck fail). `lake test -- --ignored ixvm` green; 42 pins shifted within ±0.010% (noise-tier) vs the pre-shuffle baseline — channel reorganization is cost-neutral as required, the discriminator's one read_byte trades against the eliminated `io_get_info(0, ...).len` check.

Adds a new Aiur pipeline stage that translates `Bytecode.Toplevel` into a Rust source module, one `fn aiur_fn_N` per Aiur function. The generated code mirrors `src/aiur/execute.rs`'s QueryRecord side effects exactly: same `function_queries.insert` timing, same cache-hit multiplicity bumps, same memory-queries insertion order, same `bytes{1,2}_queries` updates, same IO sequencing. Per-witness trace hashes match the interpreter byte-for-byte on every const tested (`Nat.add_comm`, `Vector.append`, `Std.Time.Week.Offset.ofMilliseconds`). * `Ix/Aiur/Stages/Codegen.lean`: structured Rust IR (`RustExpr` / `RustStmt` / `RustItem` / `MatchArm` / labeled blocks) plus a formatter. Codegen is a structural walk over the Bytecode AST, no string templating. `EmitM = StateM EmitState` threads `(nextVal, nextLabel)` so per-ValIdx Rust locals and per-MC labels are fresh. * Aiur's `map: Vec<G>` is gone in the generated kernel: every ValIdx becomes a Rust local `__v_{i}: G`. Match-arm bodies snapshot `nextVal` on entry so per-arm allocations don't leak to siblings. `MatchContinue` / `Yield` use labeled-block + `break 'label [G; OUT_SIZE]` to bubble yielded values out and rebind them at outer scope. * Deep Aiur recursion uses `stacker::maybe_grow(64 KiB, 4 MiB, …)` at the top of every generated fn, so the native call stack grows on demand rather than pre-reserving a giant thread stack. Verified against shard 24 of the 64-way `init.ixes` partition, which SEGFAULTs without it. * `ix codegen`: new CLI command. Compiles the IxVM Aiur source, walks the bytecode, writes the generated Rust to a fixed path (`src/ix/aiur_ixvm.rs`). The output path is hard-coded; no override. * `src/ix/aiur_ixvm.rs`: the generated kernel, 743 `aiur_fn_*` + `execute_generated` dispatch. Auto-regenerated; do not edit. `#![cfg_attr(rustfmt, rustfmt::skip)]` + a broad `#![allow(unused_*, non_snake_case, clippy::all)]` since the layout is for the compiler, not humans. * `src/ix/aiur_ixvm_runner.rs`: `execute_ixvm(toplevel, fun_idx, args, io_buffer) -> Result<(QueryRecord, Vec<G>), ExecError>`. Same return shape as `Toplevel::execute`, but routes through `execute_generated`. * `src/aiur/synthesis.rs`: `AiurSystem::prove_ixvm(...)` — same shape as `prove`, but the execute step calls `execute_ixvm`. Verification-compatible (proofs from one path verify under the other). * `src/aiur/execute.rs`: helpers exposed for the codegen'd kernel (`bytes{1,2}_execute` → `pub(crate)`, `unconstrained_big_uint_div_mod_helper` extracted, `CodegenBytes{1,2}{Op,}` aliases re-exported, `QueryRecord::new` → `pub(crate)`, `ExecError::InvalidFunIdx` added). * `src/ffi/aiur/protocol.rs`: two new FFI exports — `rs_aiur_toplevel_execute_ixvm` and `rs_aiur_system_prove_ixvm`. Same wire format as the existing `_execute` / `prove` exports. * `Ix/Aiur/Semantics/BytecodeFfi.lean`: `Bytecode.Toplevel.executeIxVM` — mirror of `execute`. * `Ix/Aiur/Protocol.lean`: `AiurSystem.proveIxVM`. * `Ix/Cli/CheckCmd.lean`: `runCompiled` now dispatches through `executeIxVM`. The Rust bytecode interpreter is no longer reachable from `ix check` (the Lean-side `--interp` fallback stays for richer error diagnostics). * `Ix/Cli/ProveCmd.lean`: `proveOne` uses `proveIxVM`. * `Cargo.toml`: `stacker = "0.1"`. For witnesses where execute is the dominant work (single-const checks via `ix check 'X'`), the codegen'd kernel is ~1.6× faster than the bytecode interpreter end-to-end (measured on `Nat.add_comm`, `Vector.append`, `Std.Time.Week.Offset.ofMilliseconds`). Tiny consts are within ~10% (stacker probe overhead amortises). Peak RSS is within a few MB of the interpreter — stacker grows the stack on demand, no giant upfront reservation. For shard-driven `ix check --shard K` runs the speedup is invisible: per-shard wall time is dominated (~92%) by `buildShardCheckEnvWitness` in Lean, NOT by the kernel execute (~8%). Cutting witness construction is the next lever. `Nat.add_comm` proof produced via `proveIxVM` verifies under the existing `AiurSystem::verify`. End-to-end: lake exe ix prove 'Nat.add_comm' → proof addr 0d0ab0f9… lake exe ix verify 0d0ab0f9… → ok

Replaces `IxVM.ClaimHarness.buildShardCheckEnvWitness` (Lean side, ~92% of shard wall time on heavy partitions) with a Rust port that builds the `aiur::execute::IOBuffer` directly, without per-byte boxing into Lean `Aiur.G` values. The two hot phases run on rayon: * Closure walk: each owned addr's transitive `Constant.refs` + projection-block traversal runs on its own thread; results are deduped through a `dashmap::DashSet`. * Byte-to-G conversion: per-const `(key, data)` tuples are built in parallel chunks of 256; final IOBuffer assembly (channel arena append + key→`IOKeyInfo` map insert) runs serially because the arena index is monotonic. * `src/ix/aiur_ixvm_witness.rs` (new): `build_shard_check_env_witness` produces `(claim, claim_digest_input, io_buffer)` ready to feed to `execute_ixvm`. Mirrors the 6-channel layout documented in `Ix/IxVM/ClaimHarness.lean` (claim / asm tree / const bytes / Defn hint / blob discriminator / blob raw bytes). * `src/ffi/aiur/protocol.rs`: new FFI `rs_aiur_toplevel_shard_check_ixvm` bundles witness build + `execute_ixvm` into one cross-language trip. Returns the same `(output, ioBuffer, queryCounts)` shape as `rs_aiur_toplevel_execute_ixvm`, so the Lean shim stays drop-in compatible. * `Ix/Aiur/Semantics/BytecodeFfi.lean`: `Bytecode.Toplevel.shardCheckIxVM` wraps the FFI. * `Ix/Cli/CheckCmd.lean`: shard-mode `ix check` (`--ixe + --ixes`) now dispatches through `runShardOwnedNative` → `shardCheckIxVM`, bypassing the Lean witness builder. Single- shard and whole-partition paths share the fast route; the legacy `runShardCheckManifest` / `runShardCheckAll` callbacks stay live for `--interp` only. Shard 26 of the 64-way `init.ixes` partition, on the same host: | Variant | total | speedup | | --- | --: | --: | | Lean witness + bytecode-interp (baseline)| 1029 s | 1.0× | | Lean witness + codegen kernel | 1015 s | 1.01× | | Rust witness (serial) + codegen | 101 s | 9.95× | | Rust witness (parallel) + codegen | 80 s | 12.9× | The serial Rust witness collapses the 935 s of Lean per-byte boxing into ~23 s; rayon hides that 23 s under the codegen-kernel execute (~78 s) so witness build effectively becomes free at the shard level. FFT cost matches the bytecode interpreter exactly (107_006_963_281), so the QueryRecord layout is bit-identical across all four variants. * Closure walk uses single-source BFS with the global visited set shared via `DashSet::contains` checks before pushing to the per-thread stack — avoids redundant work without a per-owned union step. * Per-channel ordering: each chunk's partial `ChannelEntries` feeds the serial fold in iteration order, so channel arena contents match the Lean side's exactly. Test still relies on trace-hash parity from the earlier codegen commit, which exercised the same path through the bytecode interpreter and the codegen kernel. * Coverage check is currently skipped on the IxVM-native whole-partition path (`runShardManifestAllNative`); legacy `runShardCheckAll` still does it for `--interp`. Re-enable separately if needed.

plumbing, unchecked cache-hit array copy Three targeted cuts to the generated kernel (`src/ix/aiur_ixvm.rs`); all preserve QueryRecord parity (shard 26 FFT cost unchanged at 107_006_963_281). Every `U8*` op previously emitted a `Vec<G>` scratch (~245 sites): let __b2_out: [G; 1] = { let mut __scratch: Vec<G> = vec![__v_i, __v_j]; if unconstrained { __scratch.extend(vec![Bytes2::xor(...)]); } else { bytes2_execute(0, 1, &Bytes2Op::Xor, &mut __scratch, record); } let __arr: [G; 1] = __scratch[2..].try_into().unwrap(); __arr }; Net cost: a 2-element `vec![]` alloc + a `__scratch.extend(vec![..])` (another small alloc inside `Bytes2::execute`'s returned Vec) + a slice→array `try_into().unwrap()`. Added per-op `bytes{1,2}_*_value(args..., record) -> G | (G,G) | (G,G,G) | [G; 8]` in `src/aiur/execute.rs` — each bumps the corresponding `bytes{1,2}_queries.bump_*` and returns the gadget output by value. The pure (`Bytes{1,2}::*`) helpers stay the unconstrained shortcut. Codegen now emits: let __v_n: G = if unconstrained { Bytes2::xor(&__v_i, &__v_j) } else { bytes2_xor_value(__v_i, __v_j, record) }; Zero `Vec<G>` allocation per byte op. `U8Add`/`U8Sub` also gain because `bytes2_add_value` runs the `Bytes2::add` gadget once (returning the full `(low, carry)`) instead of the interpreter's two `Bytes2::add` calls (one for the carry, one for the low push). To make `bump_*` callable from `execute.rs`, all `Bytes1Queries` / `Bytes2Queries::bump_*` methods are now `pub(crate)`. No behaviour change. The `Op::Call` cache-check emitted `unconstrained || OP_UN` and `!unconstrained && !OP_UN` even when the static `OP_UN` was `false`, which is the common case. Codegen now emits the folded shape directly: let __cu = unconstrained; // was: unconstrained || false if !unconstrained { *result.multiplicity += G::ONE; } // was: !unconstrained && !false When `OP_UN == true`, the callee always runs unconstrained AND the multiplicity bump is always suppressed; codegen folds to `let __cu = true;` and elides the bump branch entirely. LLVM was already folding these, so this is mostly source-cleanliness and a small frontend win. The cache-hit branch read `result.output` (an `&[G]`) and converted it back to `[G; OUT_N]` via `.try_into().unwrap()`. The length is statically `OUT_N` — we control the producer (the matching `aiur_fn_{callee}::Ctrl::Return` inserts an `[G; OUT_N]`-typed array into the same slot). The runtime bounds-check + Result discharge is dead work. Codegen now emits: let __ret: [G; OUT_N] = unsafe { *(result.output.as_ptr() as *const [G; OUT_N]) }; Sound: same-fn slot, fixed `OUT_N`, no aliasing. | Variant | mean | | --- | --: | | Rust witness (parallel) + codegen | 80 s | | + value-helper byte ops (#1) | 64 s | | + folded `__cu` + unsafe cache copy (#2+#3) | 62 s | The bulk of the win is #1 (#1 alone: ~−18%). #2+#3 are a further ~3% — mostly noise / Rust-frontend cleanup since LLVM already folds the constant `false` ops. * `src/aiur/execute.rs`: 12 value-returning byte helpers added (`bytes1_bit_decompose_value`, `bytes1_shift_left_value`, `bytes1_shift_right_value`, `bytes2_{xor,and,or,less_than,mul, chain_rotr7,chain_rotr4,add,sub}_value`). * `src/aiur/gadgets/bytes1.rs`, `src/aiur/gadgets/bytes2.rs`: all `bump_*` upgraded to `pub(crate)` so the value helpers can call them. * `Ix/Aiur/Stages/Codegen.lean`: `emitU8Bytes1` / `emitU8Bytes2` / `emitU8Add` / `emitU8Sub` rewritten to call the per-op value helpers — no scratch Vec, no slice→array conversion. `emitCall` constant-folds `opUn = false` and emits the unsafe cache-hit copy. Prelude imports updated to bring the new helpers into scope. * `src/ix/aiur_ixvm.rs`: regenerated. 245 `Vec<G>` scratch sites → 11 (only `unconstrainedBigUintDivMod`'s scratch left); 3324 `unconstrained || false` → 0; 3000+ `result.output.try_into() .unwrap()` → 0. * `Ix/Cli/CheckCmd.lean`: incidental cleanup of probe-only timing prints in `runShardOwnedNative` (added during measurement, no longer needed).

Per-claim mode (a `Claim.check addr none`) builds a closure rooted at the target address. That closure can be the whole environment when the target is a heavily-shared root constant. The Lean witness builder paid per-byte boxing into `Aiur.G` for every byte in that closure — the same cost we already eliminated for shard mode. # Surface Five new FFIs, all bundling witness build + execute_ixvm (and STARK prove where applicable) into one cross-language trip so the `IOBuffer` never crosses the boundary mid-pipeline: * `rs_aiur_toplevel_check_addr_ixvm` — `Bytecode.Toplevel.checkAddrIxVM (toplevel) (funIdx) (ixePath) (addrBytes)`: per-claim check; builds witness for `Claim.check addr none` via `build_claim_check_witness`, runs `execute_ixvm`. Same return shape as `shardCheckIxVM`. * `rs_aiur_toplevel_check_env_bytes_ixvm` — same as above but takes the env as a serialized byte blob (`Ixon.serEnv`) instead of a `.ixe` path. Used by the compiled-Lean-env code path (`ix check NAME` without `--ixe`), where the env is built in Lean memory. * `rs_aiur_system_prove_addr_ixvm` — `AiurSystem.proveAddrIxVM (...)`: per-claim prove (witness + execute + STARK prove all in Rust). * `rs_aiur_system_prove_env_bytes_ixvm` — bytes-blob counterpart. * `rs_aiur_system_shard_prove_ixvm` — `AiurSystem.shardProveIxVM (...)`: per-shard prove end-to-end in Rust. `build_claim_check_witness` lives next to `build_shard_check_env_witness` in `src/ix/aiur_ixvm_witness.rs` and uses the same parallel closure walk + parallel byte→G conversion. The bytes-blob FFIs additionally harvest `anon_hints` from each `Def` named entry after `Env::get` — `Env::get` (full form, used by the bytes-blob path) doesn't populate `env.anon_hints` the way `Env::get_anon` does, but the kernel's `verify_claim` reads ch 3 (Defn reducibility hints) so the hints must end up in the env. This mirrors the harvest pass already inside `get_anon` at `src/ix/ixon/serialize.rs:1683`. # Wiring `Ix.Cli.CheckCmd.WitnessSource`: a three-arm discriminated union threaded through `forEachClaim` (the shared check/prove driver): * `.native ixePath addr` — `.ixe`-backed env, Rust mmap. * `.nativeBytes envBytes addr` — Lean-memory env serialized to bytes, Rust decodes via `Env::get`. * `.lean witness` — pre-built `ClaimWitness` (fallback). `runCompiled` and `proveOne` dispatch on the source. # Coverage | Command | Path | |---------------------------------------|-----------------------------------------------| | `ix check NAME --ixe` | **`checkAddrIxVM`** (new) | | `ix check NAME` (no `--ixe`) | **`checkEnvBytesIxVM`** (new) | | `ix check --claim hex --ixe` | `checkAddrIxVM` for `check addr none` | | `ix check --ixes --shard K` | `shardCheckIxVM` (existed) | | `ix check --ixes` | `shardCheckIxVM` (existed) | | `ix prove NAME --ixe` | **`proveAddrIxVM`** (new) | | `ix prove NAME` (no `--ixe`) | **`proveEnvBytesIxVM`** (new) | | `ix prove --ixes --shard K` | **`shardProveIxVM`** (new) | | `ix prove --ixes` | **`shardProveIxVM`** loop (new) | | `Benchmarks/Typecheck.lean` Phase 1 | `checkAddrIxVM` / `executeIxVM` | | `Benchmarks/Typecheck.lean` Phase 2 | `proveAddrIxVM` / `proveIxVM` | | `Benchmarks/IxVM.lean` | `proveIxVM` (was `prove`) | `--interp` route is preserved: it materialises any `WitnessSource` back into a `ClaimWitness` before driving the Aiur source interpreter. `--claim <hex>` over non-`check addr none` variants (`eval` / `reveal` / `contains` / `checkEnv-with-asm`) still uses the Lean witness builder. # Sanity * `ix check Nat.add_comm --ixe init.ixe` (warm): 3.1 s wall, FFT = 23_603_449. * `ix check Nat.add_comm` (compiled env via bytes blob, warm): 2.4 s wall, FFT = 23_603_449. # Known overheads (tracked for a follow-up redesign) * **Per-claim env re-parse on `--ixe + many names`**: each FFI call re-runs `Env::get_anon_mmap` on the same path. Mathlib-scale iteration pays the lazy-index build N times instead of once. * **`--interp + .nativeBytes` round-trip**: Lean serializes the compiled env then `materialise` deserializes it back, just to feed `runInterp`. * **`runShardProveNative` claim reconstruct**: redoes the closure walk + canonical `AssumptionTree` build Lean-side after `shardProveIxVM` already computed the same claim internally. A future `EnvHandle`-based API would close all three: build the env once into a Rust-owned handle, pass the handle to every per-claim/per-shard FFI, and have prove FFIs return the serialized claim bytes.

Before: every per-claim and per-shard FFI re-parsed the Ixon env (`Env::get_anon_mmap` per call). On `--ixe + many names` / all-shards prove, this is the dominant overhead for iteration-heavy workflows — O(num_consts) lazy-index build × N targets. After: the env lives once per CLI invocation in a Rust-owned `EnvHandle`. Lean holds an opaque `Aiur.EnvHandle` reference and threads `@& EnvHandle` through every per-target FFI call. The env is parsed exactly once at handle construction; downstream calls share it. # Surface Five new FFIs collapse the previous six per-call variants (`checkAddrIxVM`, `checkEnvBytesIxVM`, `shardCheckIxVM`, `proveAddrIxVM`, `proveEnvBytesIxVM`, `shardProveIxVM` — all deleted): * `rs_aiur_env_handle_from_ixe(path) → LeanExternal<EnvHandle>`: mmap-load via `Env::get_anon_mmap`. Anon parser already harvests `anon_hints`; no post-pass. * `rs_aiur_env_handle_from_bytes(blob) → LeanExternal<EnvHandle>`: decode `Ixon.serEnv`-shape blob via `Env::get` + harvest `anon_hints` from each `Def` named entry. Used by the compiled-Lean-env path. * `rs_aiur_toplevel_check_addr_with_env(toplevel, fun_idx, handle, addr_bytes)`: per-claim check. Reuses handle's parsed env. * `rs_aiur_toplevel_shard_check_with_env(toplevel, fun_idx, handle, owned_blob)`: per-shard check. * `rs_aiur_system_prove_addr_with_env(system, fri, fun_idx, handle, addr_bytes) → (claim_bytes, proof, ioBuffer)`: per-claim prove. Rust serializes the reconstructed `Ix.Claim` via `ixon::Claim::put` so Lean can deserialize via `Ixon.runGet Ix.Claim.get` directly — no closure walk + canonical `AssumptionTree` recomputation Lean-side. * `rs_aiur_system_shard_prove_with_env(system, fri, fun_idx, handle, owned_blob) → (claim_bytes, proof, ioBuffer)`: per-shard prove. # Lean * `Aiur.EnvHandle` opaque type with `fromIxe` / `fromBytes`. * `Aiur.Bytecode.Toplevel.checkAddrWithEnv` / `shardCheckWithEnv`, `Aiur.AiurSystem.proveAddrWithEnv` / `shardProveWithEnv`. * `Ix.Cli.CheckCmd.Target` replaces `WitnessSource`: ```lean inductive Target where | addr (a : Address) -- Claim.check addr none | shard (owned : Array Address) -- Claim.checkEnv | leanW (w : ClaimWitness) -- --interp, --claim non-check ``` `runCompiled` / `proveOne` take `(envHandle?, target)`. The envHandle is `none` only for `.leanW` (`--interp` legacy path). * `runShardOwnedNative` and the manifest-driver helpers take the envHandle as a parameter. Single-shard mode builds it once; all-shards mode reuses the same handle across every shard's FFI call (eliminates per-shard re-mmap). * `runShardProveNative` deserializes the wire claim bytes via `Ixon.runGet Ix.Claim.get` instead of re-running `shardCheckEnvClaim`. # Benchmarks `Benchmarks/Typecheck.lean` builds the envHandle once before Phase 1, reuses it across Phase 1 (execute) + Phase 2 (prove). Both phases now go through `checkAddrWithEnv` / `proveAddrWithEnv` on the full-closure path. `--subject-only` still uses Lean `buildVerifyConst` + `executeIxVM` / `proveIxVM` (witnesses intentionally small there). # New crate `crates/ix/src/env_handle.rs` — the `EnvHandle` struct + constructors live in the existing `ix` crate next to the witness builder and codegen runner. # Sanity * Multi-target check (warm): `ix check --ixe init.ixe Nat.add_comm Nat.add` → 3.3 s wall, single envHandle shared across both targets. * Shard 26 check (warm): `ix check --ixe init.ixe --ixes init.ixes --shard 26` → ~78 s wall, FFT = 107_006_963_281 (parity preserved). # Coverage | Command | Path | |--------------------------------------|-----------------------------------| | `ix check NAME --ixe` | `checkAddrWithEnv` | | `ix check NAME` (no `--ixe`) | `checkAddrWithEnv` + fromBytes | | `ix check --claim hex --ixe` | `checkAddrWithEnv` for check-none | | `ix check --ixes --shard K` | `shardCheckWithEnv` | | `ix check --ixes` | `shardCheckWithEnv` × all shards | | `ix prove NAME --ixe` | `proveAddrWithEnv` | | `ix prove --ixes --shard K` | `shardProveWithEnv` | | `ix prove --ixes` | `shardProveWithEnv` × all shards | | `Benchmarks/Typecheck` Phase 1+2 | `checkAddrWithEnv` / `proveAddrWithEnv` | Only `--interp` and `--claim hex` over a non-`check addr none` persisted claim still build a Lean `ClaimWitness`; both go via the `.leanW` target arm.

`ix codegen --check` compares the emitted Rust source against the on-disk `crates/ix/src/aiur_ixvm.rs` and exits 0 if identical, 1 otherwise. No write side effect. ~2 s warm; fast enough to gate on every PR. Wired into the `lean-test` CI job so a forgotten regen on a kernel-touching PR fails CI instead of merging stale generated code that drifts from the Bytecode → Rust emitter.

Two interpreter modes behind a single `--interp` flag: * `--interp source`: Aiur source interpreter (`Aiur.runFunction` over the source-level `Decls`). Richer per-step error diagnostics. * `--interp bytecode`: generic Aiur bytecode interpreter (`Bytecode.Toplevel.execute`, `rs_aiur_toplevel_execute` route). Skips the `ix codegen` + `cargo build --release` cycle needed after editing `Ix/IxVM/*.lean` — the bytecode is rebuilt Lean-side at exe load. Slower per-check than the codegen kernel; ideal for tight iteration on the IxVM source. * omit the flag entirely for the native codegen kernel (default). Invalid values (e.g. `--interp foo`) fail fast with a clear error. # Plumbing The `check_addr_with_env` and `shard_check_with_env` FFIs take a `use_bytecode: bool` and dispatch via a shared `dispatch_execute` helper. `--interp bytecode` sets it to `true`. The `.leanW` arm of `runCompiled` also picks between `Bytecode.Toplevel.execute` and `executeIxVM` based on the same flag. The `--interp source` path requires a Lean `ClaimWitness`. The existing driver had switched to `.addr`/`.shard` targets against the Rust-owned `EnvHandle`; source-interp had no way to materialise those. `forEachClaim` now takes a `forceLeanWitness : Bool` — when true, it builds a Lean witness for every target (via `mkWitness` / `loadIxonEnv` for the compiled-Lean-env path) and passes `.leanW` so `runInterp` can consume it directly. # Sanity * `ix check Std.Time.Week.Offset.ofMilliseconds` (warm): 9.3 s wall, FFT = 12_430_516_949 (codegen kernel). * `ix check --interp bytecode Std.Time.Week.Offset.ofMilliseconds` (warm): 14.6 s wall, FFT = 12_430_516_949 (bytecode interp). * `ix check --interp source Eq` (warm): 6.3 s wall, output `Eq: ()`. * `ix check --interp garbage Nat.add_comm`: exits 1 with clear error.

arthurpaulino added 11 commits June 30, 2026 12:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IxVM: native execution pipeline#463

IxVM: native execution pipeline#463
arthurpaulino wants to merge 11 commits into
mainfrom
ap/codegen-ixvm-native

arthurpaulino commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

arthurpaulino commented Jul 1, 2026

What's here

Coverage

FFI surface

Measurements

New crate

CI

Notes for reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant