A semantic validator for JSON-stat 2.0. It reuses the official JSON Schema 2020-12 definitions for the structural pass and layers on top a rules engine that checks the cross-field cube invariants JSON Schema cannot express.
Designed in
DESIGN.md. Architectural companion to the LLM-Wiki JSON-stat knowledge base.
JSON Schema can validate shape (required properties, oneOfs, enums, the IANA link regex) but
cannot express relationships like "value array length must equal the product of size". This
package implements exactly those checks (the S/D/C catalogue) with a stable, versioned error-code
vocabulary.
{
"version": "2.0",
"class": "dataset",
"id": ["sex", "year"],
"size": [2, 3],
"dimension": { "sex": { "category": { "index": ["M", "F"] } }, "year": { "category": { "index": { "2022": 0, "2023": 1, "2024": 2 } } } },
"value": [10, 20, 30, 40, 50]
}
A schema engine says "this is shaped like a dataset". jsonstat-validator tells you the actual
problem: value has 5 entries but product(size) is 2 × 3 = 6 — see
VALUE_LEN_MISMATCH.
Pick the surface that matches your stack. They all share one
rules-manifest.json and one corpus/cases.json, so
they produce identical findings on identical input.
| Surface | Package | Install |
|---|---|---|
| TypeScript / Node | @jsonstat-validator/ts |
npm i @jsonstat-validator/ts |
| CLI | jsonstat-validate |
npx jsonstat-validate (no install) |
| Browser / CDN | — | <script src> (see Web / CDN) |
| Rust | jsonstat-validator |
cargo add jsonstat-validator |
| Wasm (JS) | @jsonstat-validator/wasm |
npm i @jsonstat-validator/wasm |
npm install @jsonstat-validator/tsimport { validate } from "@jsonstat-validator/ts";
const result = validate(doc); // doc may be an object OR a JSON string
console.log(result.valid); // true / false
console.log(result.summary); // { errors, warnings, infos, structuralErrors, byCode }
for (const f of result.findings) { // Finding[] — empty when valid
console.log(`[${f.severity}] ${f.code} ${f.path} — ${f.message}`);
}validate() accepts a parsed object or a raw JSON string (it parses for you). To validate a file
on disk, use the async validateFile():
import { validateFile } from "@jsonstat-validator/ts";
const result = await validateFile("./my-cube.json");Structural (JSON Schema) violations are normalized into the same shape with code: "STRUCTURAL_VIOLATION" and the offending keyword kept in meta — so consumers only ever handle one finding shape.
validate(doc, {
mode: "full", // "full" | "structural" | "semantic"
minSeverity: "info", // "error" | "warning" | "info" — filters returned findings
maxCollectionDepth: 3, // cap nested-collection recursion
continueOnStructuralError: true,
budget: { maxCells: 50_000_000, maxBytes: 200 * 1024 * 1024, maxFindings: 1000 },
onFinding: (f) => { /* streaming sink, for very large docs */ },
});No install needed — run it once with npx:
# validate a file
npx jsonstat-validate my-cube.json
# validate JSON on stdin (note the `-`)
echo '{"version":"2.0","class":"dataset","id":["x"],"size":[2],"dimension":{"x":{"category":{"index":["a","b"]}}},"value":[1]}' \
| npx jsonstat-validate -
# machine-readable output for CI
npx jsonstat-validate my-cube.json --format json$ npx jsonstat-validate my-cube.json
valid: false
summary: 1 errors, 0 warnings, 0 infos, 0 structural
[error] VALUE_LEN_MISMATCH /value — Dense 'value' length 1 must equal product(size) = 2.
Exit code is 0 when valid, 1 when invalid — drop it straight into a pipeline or pre-commit hook:
npx jsonstat-validate data/*.json || exit 1Options:
jsonstat-validate <file|-> [options]
--mode full|structural|semantic validation phases to run (default: full)
--format json|text output format (default: text)
--min-severity error|warning|info only show findings at/above this severity (default: info)
--structural-only alias for --mode structural
--semantic-only alias for --mode semantic
The TS package is dependency-light and browser-safe, so you can use it straight from a CDN with no build step. Works on esm.sh, jsDelivr, and unpkg.
<script type="module">
// esm.sh serves the ESM browser build:
import { validate } from "https://esm.sh/@jsonstat-validator/ts";
// or jsDelivr / unpkg:
// import { validate } from "https://cdn.jsdelivr.net/npm/@jsonstat-validator/ts/+esm";
const result = validate(myJsonstatDocument);
console.log(result.valid, result.summary);
</script><!-- Minified IIFE bundle from jsDelivr (also on unpkg) -->
<script src="https://cdn.jsdelivr.net/npm/@jsonstat-validator/ts/dist/browser/jsonstat-validator.min.js"></script>
<script>
const { validate } = window.JsonstatValidator;
const result = validate(myJsonstatDocument);
console.log(result.valid, result.findings);
</script>The browser build excludes
validateFile()(it needsnode:fs). Usevalidate()in the browser and fetch the document yourself.
npm install @jsonstat-validator/tsimport { validate } from "@jsonstat-validator/ts";The package's exports map points bundlers at the browser build automatically, so the same import
works in Node and the browser with no special config.
cargo add jsonstat-validatoruse jsonstat_validator::{validate_from_str, ValidateOptions};
fn main() {
let result = validate_from_str(json, &ValidateOptions::default());
println!("valid: {}", result.valid);
for f in &result.findings {
println!("[{}] {} {} — {}", f.severity, f.code, f.path, f.message);
}
}Or build and run the example binary directly from a checkout:
cd crates/validator && cargo test # corpus parity
echo '{"version":"2.0","class":"dataset","id":["x"],"size":[2],"dimension":{"x":{"category":{"index":["a","b"]}}},"value":[1]}' \
| cargo run --example validate -- -The Wasm surface is a thin JS wrapper over the Rust crate, exposing the same validate() shape
as @jsonstat-validator/ts. The crate is compiled to WebAssembly with
wasm-pack (--target web).
npm install @jsonstat-validator/wasmimport { init, validate } from "@jsonstat-validator/wasm";
await init(); // load the .wasm once (the browser fetches it)
const result = validate(doc); // same ValidationResult shape as @jsonstat-validator/ts
console.log(result.valid, result.summary);In Node, import from @jsonstat-validator/wasm/node instead — it auto-loads the .wasm from disk
and also exports validateFile(). See packages/wasm/README.md for the
full surface, and note that options.onFinding / options.budget are not carried across the
JS↔wasm boundary (findings are still returned in full).
The Wasm surface emits the same semantic findings as the TS and Rust surfaces — enforced by a TS↔Wasm corpus parity test that runs in CI.
See rules-manifest.json for the authoritative, append-only catalogue. Codes
include VALUE_LEN_MISMATCH, SPARSE_KEY_OUT_OF_RANGE, STATUS_LEN_MISMATCH, DIM_KEY_ID_MISMATCH,
ID_SIZE_LEN_MISMATCH, ROLE_ID_UNKNOWN, INDEX_COUNT_MISMATCH, INDEX_POSITIONS_INVALID,
LABEL_KEY_UNKNOWN, UNIT_KEY_UNKNOWN, COORD_KEY_UNKNOWN, NOTE_KEY_UNKNOWN, CHILD_ID_UNKNOWN,
CHILD_CYCLE, METRIC_UNIT_MISSING (warning), BUNDLE_DEPRECATED (info), CUBE_SIZE_OVERFLOW,
PARSE_ERROR, and STRUCTURAL_VIOLATION. The vocabulary is versioned independently
(meta.ruleSetVersion) from the package SemVer — see DESIGN.md §4.4 for the full
S/D/C rule table.
The repo is an npm workspace plus a Cargo crate. From a checkout:
# TypeScript: npm install runs the `prepare` hook, which auto-builds both
# workspaces (so `npx jsonstat-validate` works immediately) and re-links the bin.
npm install
npm test # 27/27 corpus + rule tests
# After editing TS/CLI sources, rebuild:
npm run build # regenerates @jsonstat-validator/ts and jsonstat-validate
# Rust:
cargo test --manifest-path crates/validator/Cargo.tomlThe manifest and vendored schemas live at the repo root as the single source of truth shared with the
Rust surface. They are not read from disk at runtime (that would break npm publish and browser
use). Instead packages/ts/tools/gen-assets.mjs compiles them into
packages/ts/src/generated/ TypeScript modules at build time, so the
published tarball carries everything it needs. The Rust crate does the equivalent with include_str!.
packages/ts/tools/build-browser.mjs uses
esbuild to emit two self-contained files under dist/browser/:
jsonstat-validator.min.js (IIFE, global JsonstatValidator) and jsonstat-validator.mjs (ESM).
Releases are cut by release.yml, which publishes two places on a
GitHub Release (or workflow_dispatch):
- npm (the
@jsonstat-validator/ts,jsonstat-validate, and@jsonstat-validator/wasmpackages) uses npm Trusted Publishing (OpenID Connect) — no stored token. Prerequisite: each package is registered as a Trusted Publisher trusting this repo + workflow file on the npm side. - crates.io (the
jsonstat-validatorcrate) has no OIDC equivalent, so it authenticates with a long-livedCARGO_REGISTRY_TOKENstored as an environment secret named exactly that, under a GitHub environment named exactlycrates-io(matchingenvironment: crates-ioin the workflow). The token's first publish needspublish-newscope (creating the crate) pluspublish-update(every version after); once the crate exists, rotate it to a crate-scopedpublish-updatetoken.
Recommended repo setup before the first publish:
- Settings → Environments → New environment → name it
crates-io; optionally enable Required reviewers and limit Deployment branches and tags. - On that environment → Add secret → name
CARGO_REGISTRY_TOKEN, paste a crates.io API token. - Dry-run locally:
cargo publish --dry-run --manifest-path crates/validator/Cargo.toml. - Tag and push
vX.Y.Z, create the GitHub Release; both jobs run (crate job awaits approval if you set a reviewer).
Always commit the root rules-manifest.json bump and the Rust
_vendored snapshot together —
vendored_parity.rs compares committed blobs, so they
must land in the same commit or that test fails.
Three native surfaces ship and are kept at parity — they emit identical findings on identical input:
- TypeScript / Node —
@jsonstat-validator/tsplus thejsonstat-validateCLI. - Rust — the
jsonstat-validatorcrate. - Wasm —
@jsonstat-validator/wasm, a thin JS wrapper over the crate.
Parity is enforced, not asserted: one shared corpus/cases.json drives the
TS and Rust suites and the TS↔Wasm parity test that runs in CI, closing the TS ↔ Rust ↔ Wasm
triangle. A committed-snapshot guard
(crates/validator/tests/vendored_parity.rs) keeps the
Rust _vendored copy byte-identical to the repo-root rules-manifest.json
and schemas/vendored/, catching pure-metadata drift the corpus test misses.
Releases publish via release.yml: npm through Trusted Publishing
(OIDC), and the Rust crate to crates.io through a CARGO_REGISTRY_TOKEN secret.
The package is pre-1.0. engineVersion is 0.3.0; the error-code vocabulary is versioned
independently via meta.ruleSetVersion (1.0.0, append-only within a major) so downstream tooling
can branch on the vocabulary rather than the package version — see DESIGN.md §4.5.
The public validate() surface and result shape are stable in practice, but breaking changes remain
allowed before 1.0.0.
Remaining work tracked toward the 1.0.0 cut (see CHANGELOG.md):
schemas/curated/de-duplicated schema set, with a curated≡vendored parity test.- A curated-parity CI job (pending the
curated/schemas above). - A written SemVer /
ruleSetVersionstability policy as the commitment artifact for1.0.0.
The M1–M5 milestone history is recorded in DESIGN.md §11 and the git log.
Apache-2.0.
{ "valid": false, "findings": [ { "code": "VALUE_LEN_MISMATCH", "ruleId": "S3", "severity": "error", "path": "/value", "message": "Dense 'value' length 5 must equal product(size) = 6.", "expected": 6, "actual": 5, "specRef": "wiki/format-specification.md" } ], "summary": { "errors": 1, "warnings": 0, "infos": 0, "structuralErrors": 0, "byCode": { "VALUE_LEN_MISMATCH": 1 } }, "options": { /* resolved ValidateOptions */ }, "meta": { "engineVersion": "0.3.0", "ruleSetVersion": "1.0.0", "schemaVersion": "1.05", "durationMs": 3 } }