feat(grading): hidden-criteria firewall + held-out blend as a substrate primitive#283
Merged
Merged
Conversation
…te primitive
Lift the held-out / hidden-criteria grading firewall out of the coding
benchmark example into a domain-agnostic primitive so any domain (research,
legal, tax, content) can grade an agent on criteria it never saw.
Two reusable, domain-free pieces, composed from existing types (JudgeScore),
no node/test/TS/exec baked in:
- Field routing by destination: a scenario tags each field agent-visible /
develop-against / grading-only / judge-only; routeFields + assertNoHiddenLeak
enforce that a grading-only/judge-only value never reaches the agent context
(fail-loud ValidationError).
- Hidden-criteria grading: the domain supplies its own
(artifact, hiddenCriteria) => { passRate, total } grader; the substrate
provides firewall enforcement (gradeOnHidden) + the held-out-weighted
composite (blendHeldout / withHeldoutBlend, default 0.7/0.3).
The coding node-test executor stays in the example as ONE grader implementation.
20 focused tests on a non-coding (legal) domain prove the firewall rejects a
leaked grading-only field and that blendHeldout composes correctly.
tangletools
approved these changes
Jun 24, 2026
tangletools
left a comment
Contributor
There was a problem hiding this comment.
✅ Auto-approved PR — 7e582fce
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T18:08:33Z
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Lifts the held-out / hidden-criteria grading FIREWALL out of the coding-benchmark example (in agent-runtime) into a domain-agnostic substrate primitive, so any domain — research, legal, tax, content — can grade an agent on hidden criteria it never saw, not just coding.
New module:
src/hidden-criteria-grading.ts. Additive subpath on the root index — no breaking change, zero consumer updates needed.The two reusable, domain-free pieces
The coding-LOCAL execution mechanism (
node --test, TAP parsing) stays in the example. Only the general pieces are lifted, composed from existing types (JudgeScore) — nothing reinvented, nonode/test/TS/exec/regex baked into the substrate.1. Field routing by destination (the firewall as a type). A scenario tags each field by where it is allowed to flow:
FieldDestinationagent-visibledevelop-againstgrading-onlyjudge-onlyrouteFields(routing, values)builds the routed field set from a domain's(field → destination)+(field → value)maps (fail-loud on a missing value).assertNoHiddenLeak(fields, agentContext)is the firewall: throwsValidationErrorif anygrading-only/judge-onlyvalue appears in the exact text that reaches the agent.agentVisibleFields(...)returns the safe-to-render fields so a caller assembles the context from the routing instead of hand-picking.2. Hidden-criteria grading. The domain supplies its own grader; the substrate supplies firewall enforcement + the composite:
HiddenCriteriaGrader<TArtifact, THidden> = (artifact, hiddenCriteria, signal?) => { passRate, total }— the one seam a non-coding domain implements. The coding node-test executor is ONE implementation a consumer plugs in.gradeOnHidden({ artifact, hiddenCriteria, grader, firewall })— re-asserts the firewall at grading time on the real agent context, then runs the grader.hiddenGrade(passed, total)— the single-sourced honest-zero pass-rate rule (total === 0 → passRate 0, never a spurious pass).blendHeldout(heldoutPassRate, judgeScore, weights?)— the composite (default 0.7 hidden correctness / 0.3 judge quality; weights renormalized; inputs clamped to [0,1]).withHeldoutBlend(score, heldoutPassRate, weights?)— wraps a judge'sscoreso the reportedcompositebecomes the held-out-weighted blend (passes afailedverdict through untouched).How a NON-coding domain plugs in
```ts
import { routeFields, gradeOnHidden, blendHeldout, hiddenGrade } from '@tangle-network/agent-eval'
// 1. Declare where each field flows
const fields = routeFields(
{ question: 'agent-visible', sample: 'develop-against', required: 'grading-only', rubric: 'judge-only' },
{ question, sample, required, rubric },
)
// 2. Bring YOUR OWN grader — no node/test here
const legalGrader = (artifact, hidden) =>
hiddenGrade(hidden.mustCite.filter(c => artifact.brief.includes(c)).length, hidden.mustCite.length)
// 3. Grade behind the firewall, blend with the judge
const heldout = await gradeOnHidden({ artifact, hiddenCriteria, grader: legalGrader, firewall: { fields, agentContext } })
const score = blendHeldout(heldout.passRate, judgeComposite)
```
Tests
20 focused tests on a non-coding (legal-brief) domain — proving the firewall has no domain coupling. They cover the two required proofs explicitly:
assertNoHiddenLeak/gradeOnHiddenreject agrading-only(andjudge-only) field reaching the agent context.blendHeldoutcomposes correctly (default + renormalized weights, clamping, zero-sum guard,withHeldoutBlendcomposite replacement + failed-verdict pass-through).Verification
pnpm typecheck+pnpm build+pnpm test(251 files / 2581 tests) +pnpm lint+pnpm run verify:package— all green. Version trio bumped together: npmpackage.json,clients/python/pyproject.toml,__init__.py→ 0.100.0.Grain mirrors the recently-landed
treatment-gate.ts: pure predicates + pure composition, fail-loud, parameterized matchers/graders, no domain literal in the module. Placed next totest-graded-scenario.ts/partition-held-out.ts(a scorecard/grading concept that makes sense without a running loop).