Skip to content

TA eval: result-shape-constraint shaped-product hook (nonnull-pair)#565

Open
evaleev wants to merge 9 commits into
masterfrom
feature/result-shape-nonnull-pair-constraints
Open

TA eval: result-shape-constraint shaped-product hook (nonnull-pair)#565
evaleev wants to merge 9 commits into
masterfrom
feature/result-shape-nonnull-pair-constraints

Conversation

@evaleev

@evaleev evaleev commented Jun 27, 2026

Copy link
Copy Markdown
Member

SeQuant half of the nonnull-pair result-shape-constraint feature (the mpqc side is ValeevGroup/mpqc4 PR for feature/result-shape-nonnull-pair-constraints).

Stacked on #559 (feature/cost-model-batch-aware). Base is set to that branch so the diff is just the 6 result-shape commits; retarget to master once #559 merges.

What this adds

A method-supplied, opaque result-shape provider reached through the TA backend context, letting a consumer impose a TA::SparseShape on a binary-Product node's result during sequant::evaluate. Generic eval/CacheManager stay TA-free; all TA specifics live in TAEvalContext + the hook closure.

Commits:

  • eeca28641, dd1706c14 — de-risk spikes: standard-layer ToT product + T×ToT + dot_inner denest honor an imposed SparseShape.
  • 47a3b4607 — thread an optional TA result-shape provider to the binary-product site (type-erased shaped_product_hook on CacheManager; eval.hpp/cache_manager.hpp name only ResultPtr/Result/std::any).
  • bcbe360d1 — emit the shape-constrained product via the standard expression layer ((l*r).set_shape(s) / dot_inner(...).set_shape(s)); empty hook ⇒ byte-identical default.
  • dd6847195make_hook declines scalar-operand products (no TiledRange) before the trange computation, fixing a segfault the moment a real provider is active.
  • 3709ee68f — recognize the arena-inner-tile ToT kind (DistArray<Tensor<ArenaTensor<NumericT>>>) via an InnerTileT template parameter (default Tensor<NumericT>, so existing behavior is unchanged), so the hook fires on CSV/PNO intermediates instead of declining them; plus a graceful decline (tot_product_needs_inner_reorder) when the ToT general product would need a non-identity inner result permutation TA can't yet emit (falls through to einsum prod() — lossless).

Scope / safety

  • Default behavior is byte-identical: with no provider the hook returns nullptr immediately.
  • Sparse-policy / TA-backend only; the generic eval path is untouched when the hook is empty.

Tested

Eval-level shape spikes (560 assertions). End-to-end via the mpqc consumer: closed-shell CSV-CCk (PNO-CCSD) is lossless ON vs OFF and the targeted (g.C)(g.C) giant shrinks to its surviving-pair support (3× on a water-trimer test), with energy preserved to ~1e-14.

🤖 Generated with Claude Code

https://claude.ai/code/session_01Y9QnUcKzvPp5bJSS5hvCyc

evaleev added 7 commits June 25, 2026 22:07
Adds TEST_CASE("shape_spike_tot_general_product", "[shape-spike]") that
de-risks the core assumption of the result-shape-constraints feature: that a
ToT general product can be evaluated through TiledArray's standard expression
layer (A(la) * B(ra)) with an imposed SparseShape via .set_shape(s), rather
than through TA::einsum.

The test uses the same ToT*ToT->ToT annotation as the existing
ToT_times_ToT_to_ToT section (contraction over outer i_3 and inner a_4;
Hadamard i_1,i_2; result outer (i_2,i_1)), but with SparsePolicy and a
multi-tile outer TiledRange (2 tiles per occ mode) so that 4 outer result
tiles exist and tile (0,0) can be masked to zero by the imposed shape.

Outcome: PASS -- (A(la)*B(ra)).set_shape(s) evaluates without throwing,
honors the imposed mask (tile (0,0) is absent in the result), and the
surviving tiles match TA::einsum to floating-point precision.
130 assertions pass.
Adds two more [shape-spike] test cases covering the other two contraction
kinds that the SeQuant TA backend's prod() emits as shape-eligible
intermediates:

Case A (shape_spike_T_times_ToT_general_product): T x ToT -> ToT mixed
operand product (flat DF-integral-like g times PNO-coefficient-like ToT C).
(T_op(la) * ToT_op(ra)).set_shape(s) evaluates, honors the imposed
SparseShape, and matches the TA::einsum baseline on surviving tiles.

Case B (shape_spike_ToT_inner_contraction_to_flat_T): ToT x ToT with the
inner (composite) indices fully contracted and the outer (occ) indices
surviving, denesting to a flat tensor-of-scalars result -- the
einsum<DeNest::True> / dot_inner path (result.hpp:581). The standard-layer
equivalent is the .dot_inner() expression; DotInnerExpr derives from Expr so
it exposes set_shape(), and the override is honored:
  C(c) = A(a + inner.a).dot_inner(B(b + inner.b)).set_shape(s);
evaluates, honors the shape, and matches the einsum<DeNest::True> baseline.

Both PASS. 280 assertions across 3 shape-spike cases.
…ct site

Add TAEvalContext (SeQuant/core/eval/backends/tiledarray/eval_context.hpp)
holding a result_shape_provider callback (node x trange -> optional<SparseShape>).
Thread it to the binary-Product site in evaluate() via a new type-erased
product_node_visitor field on CacheManager: the visitor is invoked with
std::any(std::cref(node)) at each Product node before prod() is called.
An empty visitor (the default) is a no-op; existing callers compile and
behave identically.

Chosen mechanism: cache-carried visitor (option a from the design brief).
Rationale: CacheManager is already threaded to the binary-product site
(custom_evaluator follows the same pattern); no evaluate() signature changes
are needed; the std::any wrapping keeps TA types out of generic eval headers.

Test [shape-provider]: ToT*ToT->ToT eval with a provider that increments a
counter and returns nullopt; asserts counter>=1 (provider reached) and that
the result equals the no-visitor reference (nullopt => no behavior change).
…ion layer

Grow the Task 1 product-node seam into a shaped-product hook that actually
applies a provider-returned TA::SparseShape. The hook (CacheManager::
shaped_product_hook_) is type-erased as
  ResultPtr(any node, Result const& left, Result const& right, ann),
so eval.hpp and cache_manager.hpp stay free of TiledArray types; all TA
specifics (trange computation, provider call, set_shape) live in the TA
backend (eval_context.hpp + result.hpp).

At the binary-Product site, eval consults the hook before prod(): a non-null
return replaces the product, a null return (empty hook or provider nullopt)
falls through to the existing prod(). Default-empty => byte-identical.

result.hpp gains:
- detail::result_outer_trange / outer_annot_labels: build the result outer
  TiledRange by matching result outer labels to operand TiledRange1's;
- result_outer_trange_from_results: the same from type-erased operands;
- apply_shaped_product<NumericT,PolicyT>: emits both Task-0 forms selected by
  operand nesting / de_nest --- (lhs(la) * rhs(ra)).set_shape(s) for general
  products (T*T, T*ToT, ToT*ToT->ToT) and lhs(la).dot_inner(rhs(ra)).
  set_shape(s) for the DeNest::True ToT*ToT->flat path --- fencing before
  return so the shape outlives the lazy assignment.

TAEvalContext::make_hook<NumericT,PolicyT> builds the hook and captures the
provider BY VALUE (Task 1 review minor) so it does not dangle on ctx.

Tests [shape-provider]: a real shape (zero an outer tile) on both a *-form
general product and a dot_inner denest-to-flat product -- zeroed tile is_zero,
survivors equal the unshaped einsum baseline; plus full-ones no-op, nullopt
decline, and no-hook cases all equal to the unshaped reference.
A product may legitimately have a scalar (ResultScalar) operand, which carries
no TiledRange; computing the result outer trange would mis-cast it. Such a
product has no outer tensor to shape, so decline early (return null -> normal
prod) before the trange computation. Required once a real result_shape_provider
is active.
The result-shape-constraint hook (make_hook) and its apply path
(result_outer_trange_from_results, apply_shaped_product) hardcoded the nested
(ToT) operand kind as DistArray<Tensor<Tensor<NumericT>>>. The CSV/PNO path
produces ToT operands with an arena-pinned inner tile
(DistArray<Tensor<ArenaTensor<NumericT>>>), a distinct (exact-type-id) Result
kind, so the is_tensor_like guard declined every CSV intermediate -- including
the (g.C)(g.C) giant the feature targets -- before the provider was consulted.

Add an InnerTileT template parameter (default TA::Tensor<NumericT>, so existing
behavior is unchanged) threaded through make_hook ->
result_outer_trange_from_results / apply_shaped_product, so the consumer can
instantiate the hook with the arena inner tile and have CSV ToT intermediates
recognized and shaped.

Also: TiledArray's expression-layer general ToT product cannot emit a result
whose inner annotation needs a non-identity permutation (cont_engine throws).
Detect that case (tot_product_needs_inner_reorder) and decline so the eval
falls through to the unshaped einsum prod() (which handles the reorder) --
lossless, just not shaped on those nodes.

With this, the cross-pair (g.C)(g.C) giant is shaped to its surviving-pair
support (3x smaller in a water-trimer test: 23.65 MB -> 7.88 MB), energy
preserved to ~1e-14.
…mized builds

The "quadratic bubble" test ran single_term_opt on a 12-leaf network ~29
times, only one of which was ever asserted; the rest were a diagnostic
std::wcout sweep. In Debug the DP is ~100x slower (~100 s/call, ~48 min
total), exceeding CTest's default timeout and failing CI's Debug builds
(and cancelling the Valgrind/Sanitizer jobs).

Collapse the sweep to 5 verified early-K/late-K crossover assertions and
gate the test on __OPTIMIZE__. NDEBUG is defined in every SeQuant build
type (asserts use SEQUANT_ASSERT_BEHAVIOR_), so it cannot distinguish
Debug from Release; __OPTIMIZE__ is set by GCC/Clang only at -O1+. The
test now runs in ~4 s in Release and is excluded entirely at -O0.
@evaleev evaleev force-pushed the feature/result-shape-nonnull-pair-constraints branch from 9f4ccb9 to 1cb3052 Compare June 29, 2026 17:53
Base automatically changed from feature/cost-model-batch-aware to master June 29, 2026 19:04
evaleev added 2 commits June 29, 2026 23:32
mode_batches_of_trange1 closed a batch as soon as the accumulated whole-tile
size reached or exceeded target_batch_size, so the realized batch could exceed
the target by nearly a full tile: any target a hair above the tile size rounded
UP to two tiles, doubling the batch. That defeats the memory bound the target is
meant to enforce and, for CSV/PNO giant intermediates, doubled the materialized
aux (K) slice (e.g. aux_target_size=243 with 236-wide K tiles -> 472-wide
batches), exposing a TiledArray SUMMA sparse-broadcast edge case.

Close the batch before a tile would push it over the target, so the realized
batch never exceeds target_batch_size except for the one-tile floor (a lone tile
larger than the target). The batch count now changes only at multiples of the
tile size.

Docs updated to reflect the upper-bound (<=) semantics at the BatchPolicy
interface, the runtime evaluator, and the optimizer cost model.
The result-shape shaped-product path (result.hpp::apply_shaped_product) and its
eval test use TiledArray's dot_inner (ToT*ToT->T) expression, added after the
previously pinned b8c1d75 -- so the Linux/MacOS Build CI failed to compile
test_eval_ta.cpp ("no member named 'dot_inner'"). Bump to f20abfb44 (the tag
MPQC tracks) which provides dot_inner. Forward bump (53 commits); MADNESS
follows transitively via TiledArray.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant