Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,18 @@ requirements-file dependency change through the generated report and confirm
that local risk buckets remain policy-review evidence rather than package
safety verdicts.

## v1.1: Input and Policy Semantics

The v1.1 route is implementation-led: pin the real input support matrix,
version policy and report contracts, make policy decision evidence explicit,
then harden component identity canonicalization and same-input conflict
diagnostics. The detailed acceptance gates live in
[tools/sbom-diff-and-risk/docs/v1.1-input-and-policy-semantics.md](tools/sbom-diff-and-risk/docs/v1.1-input-and-policy-semantics.md).

The tool remains in this monorepo. Repository extraction is deferred until
real third-party adoption creates an independent release or maintenance
boundary.

## Parked Directions

- Parser-boundary fixtures, only when the before/after input and normalized
Expand Down
17 changes: 11 additions & 6 deletions tools/sbom-diff-and-risk/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ For a consumer-facing GitHub Actions example, see
[docs/github-actions-consumer-example.md](docs/github-actions-consumer-example.md).
For regenerating checked-in local example outputs, see
[docs/example-artifact-regeneration.md](docs/example-artifact-regeneration.md).
The v1.1 implementation sequence is fixed in
[docs/v1.1-input-and-policy-semantics.md](docs/v1.1-input-and-policy-semantics.md).

1. If you want to verify `sbom-diff-and-risk` itself, start with
[docs/verification.md](docs/verification.md).
Expand Down Expand Up @@ -93,11 +95,14 @@ When a `purl` includes a version, the tool keeps the full value in `Component.pu

## Supported Formats

- CycloneDX JSON
- SPDX JSON
- `requirements.txt`
- `pyproject.toml` via PEP 621 `[project]` metadata
- `pyproject.toml` dependency groups via PEP 735 `[dependency-groups]` with explicit selection
- CycloneDX JSON, top-level component subset
- SPDX JSON, top-level package subset
- `requirements.txt`, conservative PEP 508 subset
- `pyproject.toml`, PEP 621 arrays and explicitly selected PEP 735 groups

See the test-backed [input format support matrix](docs/parser-boundaries.md)
for the exact fields and unsupported constructs. A recognized container format
does not imply full specification conformance.

## Risk Bucket Semantics

Expand Down Expand Up @@ -219,7 +224,7 @@ The checked-in [examples/sample-summary.json](examples/sample-summary.json) arti
For CI dashboard, job-summary, and local-threshold examples, see [docs/summary-json-ci-cookbook.md](docs/summary-json-ci-cookbook.md).

`--policy-json PATH` writes only policy-related JSON sections from the full
report. It includes `policy_evaluation`, policy finding lists, `rule_catalog`,
report. It includes `policy_schema`, `policy_evaluation`, policy finding lists, `rule_catalog`,
and `summary.policy` when policy evaluation is applied. For CI job-summary
examples, see
[docs/policy-decision-ci-cookbook.md](docs/policy-decision-ci-cookbook.md).
Expand Down
48 changes: 42 additions & 6 deletions tools/sbom-diff-and-risk/docs/parser-boundaries.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,45 @@
# Parser boundaries
# Input format support matrix and parser boundaries

`sbom-diff-and-risk` intentionally supports a conservative parser subset so local runs remain deterministic, auditable, and CI-friendly.

The project does not try to emulate a package installer. When syntax would require resolver behavior, implicit includes, index lookups, or environment-specific side effects, the parser fails closed with an explicit error.

## requirements.txt
## Support matrix

Every row below describes parser behavior implemented and covered by tests. A
recognized container format does not imply full specification conformance.

| Input | CLI format | Implemented subset | Not claimed |
| --- | --- | --- | --- |
| CycloneDX JSON | `cyclonedx-json` | Top-level `components`; component name, version, purl, bom-ref, type, first usable license, supplier/author, and selected external-reference URLs | CycloneDX schema validation, dependency graphs, services, vulnerabilities, compositions, or recursive nested components |
| SPDX JSON | `spdx-json` | Top-level `packages`; package name, versionInfo, SPDXID, primary purpose, declared/concluded license, supplier/originator, purl external reference, and selected source URLs | SPDX schema validation, relationship graphs, files, snippets, annotations, or license analysis |
| requirements file | `requirements-txt` | The PEP 508 requirement subset listed below, with comments and deterministic line continuation | pip installation semantics, includes, constraints, index configuration, hashes, URLs, VCS references, archives, or local paths |
| `pyproject.toml` | `pyproject-toml` | PEP 621 dependency arrays and explicitly selected PEP 735 dependency groups, including local group includes | A general Python lockfile/parser, build backend interpretation, or Poetry/Hatch/PDM tool-specific tables |

No XML SBOM, SPDX tag-value, YAML SBOM, package-lock, poetry.lock, uv.lock,
or other lockfile parser is currently registered. Such formats are unsupported,
not silently treated as one of the rows above.

## CycloneDX JSON

The parser requires a top-level JSON object with `bomFormat: CycloneDX`. It
reads only the top-level `components` array. Each component requires `name`;
all other normalized fields are optional.

The parser does not currently constrain `specVersion` or validate the document
against a CycloneDX schema. Acceptance therefore means the implemented fields
were readable, not that the entire document is CycloneDX-conformant.

## SPDX JSON

The parser requires a top-level JSON object with a string `spdxVersion` and
reads only the top-level `packages` array. Each package requires `name`.

The parser does not currently constrain the SPDX version or validate the
document against an SPDX schema. Relationships and file-level data do not
affect component identity or policy decisions.

## Requirements files

`requirements.txt` is treated as a narrow manifest format, not as "everything pip can do in a file".

Expand All @@ -23,18 +58,19 @@ The project does not try to emulate a package installer. When syntax would requi

When unsupported syntax appears, the parser raises `UnsupportedInputError` and the CLI returns exit code `2`.

## pyproject.toml
## `pyproject.toml`

`pyproject.toml` support is also intentionally narrow:

| Section | Status | Notes |
| --- | --- | --- |
| `[project.dependencies]` | Supported | Parsed by default |
| `[project.optional-dependencies]` | Supported | Parsed by default and kept distinct from dependency groups |
| `[project]` `dependencies` | Supported | Parsed by default as a PEP 508 string array |
| `[project.optional-dependencies]` | Supported | Every declared optional group is parsed by default and kept distinct from dependency groups |
| `[dependency-groups]` | Supported | Requires explicit `--pyproject-group <name>` selection |
| `{ include-group = "name" }` inside dependency groups | Supported | Includes are resolved locally and deterministically |
| PEP 508 direct references in supported arrays | Supported | Recorded as local manifest evidence; no URL is fetched by the parser |
| Missing requested dependency group | Explicit error | Reported as `InputSelectionError` |
| Poetry, Hatch, PDM, or other tool-specific dependency sections | Unsupported | Not parsed in v0.2 |
| Poetry, Hatch, PDM, or other tool-specific dependency sections | Unsupported | Not parsed |

Dependency groups are not merged automatically with `[project.optional-dependencies]`. They solve different problems and are kept separate on purpose.

Expand Down
25 changes: 25 additions & 0 deletions tools/sbom-diff-and-risk/docs/policy-decision-explainability.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ policy findings.

## Field contract

- `matched_rule_id`: Exact policy rule id that produced the decision.
- `exact_evidence`: Stable object containing the compared component key,
finding bucket, configured threshold, and observed value. Non-applicable
members remain `null` rather than disappearing.
- `decision_reason`: Stable reason code for the policy decision.
- `policy_rule`: Policy rule id that produced the decision.
- `severity_source`: Source of the active severity, such as `block_on`,
Expand All @@ -36,6 +40,11 @@ policy findings.
decision, when applicable.
- `observed_value`: Observed local value that was compared to the policy rule,
when applicable.
- `confidence_level`: Evidence source level for this decision:
`policy_matched`, `provenance_recorded`, or `scorecard_recorded`.

`confidence_level` describes which recorded evidence the decision used. It is
not a probability and does not express package safety.

The full JSON report shape is documented in [report-schema.md](report-schema.md).
Policy configuration fields and supported rules are documented in
Expand All @@ -47,7 +56,15 @@ A policy finding with:

```json
{
"matched_rule_id": "max_added_packages",
"decision_reason": "added_package_count_exceeded_threshold",
"exact_evidence": {
"component_key": null,
"finding_bucket": null,
"matched_threshold": 0,
"observed_value": 1
},
"confidence_level": "policy_matched",
"policy_rule": "max_added_packages",
"severity_source": "block_on",
"matched_threshold": 0,
Expand All @@ -63,7 +80,15 @@ A policy finding with:

```json
{
"matched_rule_id": "new_package",
"decision_reason": "risk_finding_matched_policy_rule",
"exact_evidence": {
"component_key": "purl:pkg:pypi/example-package",
"finding_bucket": "new_package",
"matched_threshold": null,
"observed_value": "new_package"
},
"confidence_level": "policy_matched",
"policy_rule": "new_package",
"severity_source": "warn_on",
"matched_threshold": null,
Expand Down
20 changes: 17 additions & 3 deletions tools/sbom-diff-and-risk/docs/policy-schema.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,31 @@
# Policy schema

`sbom-diff-and-risk` supports YAML-only policy schemas in versions `1`, `2`,
and `3` for the local, provenance-aware, and optional Scorecard-aware policy
flows described here.
`sbom-diff-and-risk` identifies the serialized policy contract with:

```yaml
policy_schema: sbom-diff-risk.policy.v1
```

The separate integer `version` remains the policy capability level: `1` for
local rules, `2` for provenance-aware rules, and `3` for optional
Scorecard-aware rules. Keeping these concepts separate lets the serialized
contract evolve without relabeling existing rule capability levels.

The schema is intentionally conservative and fail-closed:

- unknown rule ids are rejected
- unknown top-level keys are rejected
- unknown `policy_schema` values are rejected
- invalid types are rejected
- policies that omit `policy_schema` remain readable as v1.0-compatible input
- normalized policy output always records `sbom-diff-risk.policy.v1`
- version `1` remains the v0.2-compatible schema and existing v0.2 policies continue to work unchanged
- version `2` adds provenance-aware gating for explicit PyPI enrichment evidence
- version `3` adds optional Scorecard-aware gating for explicitly requested Scorecard enrichment

## Version 1 fields

- `policy_schema: sbom-diff-risk.policy.v1`
- `version: 1`
- `block_on: [rule_id, ...]`
- `warn_on: [rule_id, ...]`
Expand Down Expand Up @@ -113,6 +124,7 @@ see
## Version 1 example

```yaml
policy_schema: sbom-diff-risk.policy.v1
version: 1
block_on:
- unknown_license
Expand All @@ -130,6 +142,7 @@ ignore_rules:
## Version 2 example

```yaml
policy_schema: sbom-diff-risk.policy.v1
version: 2
block_on:
- provenance_required
Expand All @@ -147,6 +160,7 @@ allow_unattested_publishers:
## Version 3 example

```yaml
policy_schema: sbom-diff-risk.policy.v1
version: 3
warn_on:
- scorecard_below_threshold
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ Reference inputs and policy:
The minimal policy is intentionally small:

```yaml
policy_schema: sbom-diff-risk.policy.v1
version: 1
block_on:
- unknown_license
Expand Down
12 changes: 12 additions & 0 deletions tools/sbom-diff-and-risk/docs/report-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ JSON reports currently use this top-level structure:

| Field | Description |
| --- | --- |
| `report_schema` | Stable report contract identifier; currently `sbom-diff-risk.report.v1`. |
| `summary` | Compact run summary for deterministic machine consumption. |
| `evidence_confidence` | Highest evidence-confidence level represented by this report. |
| `components` | Added, removed, and changed component records. |
Expand Down Expand Up @@ -53,6 +54,11 @@ These fields describe why a local policy rule produced a block, warning, or
suppression. They are policy-decision metadata only; they are not dependency
safety verdicts, CVE results, or proof that a package is safe or unsafe.

- `matched_rule_id`: Exact policy rule id that produced the decision. The
legacy `rule_id` and `policy_rule` fields remain available in report v1.
- `exact_evidence`: Structured local comparison evidence with
`component_key`, `finding_bucket`, `matched_threshold`, and
`observed_value`. Values remain `null` when that dimension does not apply.
- `decision_reason`: Stable reason code for the policy decision, such as
`risk_finding_matched_policy_rule`,
`added_package_count_exceeded_threshold`, or
Expand All @@ -66,6 +72,9 @@ safety verdicts, CVE results, or proof that a package is safe or unsafe.
decision, when applicable.
- `observed_value`: Observed local value that was compared to the policy rule,
when applicable.
- `confidence_level`: Evidence level used for this decision. Local policy
matches use `policy_matched`; decisions based on recorded provenance or
Scorecard data use `provenance_recorded` or `scorecard_recorded`.

Explanation fields appear only on policy finding objects. Risk findings in
`risks` remain the analyzer's local heuristic findings and do not receive
Expand All @@ -80,6 +89,7 @@ consumer snippets, see
The `--policy-json PATH` CLI option writes a policy-only JSON sidecar using the
same policy-related sections from the full JSON report:

- `policy_schema`, currently `sbom-diff-risk.policy.v1`
- `policy_evaluation`
- `blocking_findings`
- `warning_findings`
Expand Down Expand Up @@ -170,6 +180,8 @@ stable for tests and downstream consumers.
## Stability notes

- JSON reports are intended for machine consumption.
- Consumers should select the contract using `report_schema` before relying on
required fields.
- Golden samples lock important output shape for stable reviewer and CI expectations.
- The schema is conservative and additive where possible.
- Missing `summary.policy` means policy was not applied.
Expand Down
90 changes: 90 additions & 0 deletions tools/sbom-diff-and-risk/docs/v1.1-input-and-policy-semantics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# v1.1 technical route: Input and Policy Semantics

v1.1 narrows ambiguity at the two boundaries that CI consumers depend on:
what an input parser actually supports, and why a policy decision was emitted.
It does not add another scientific-computing mini-project or move the tool out
of this monorepo.

## Delivery status

| Contract | Current v1.1 status |
| --- | --- |
| Real input support matrix | Implemented in [parser-boundaries.md](parser-boundaries.md) from the registered parsers |
| Policy schema identifier | Implemented as `sbom-diff-risk.policy.v1`; legacy policy files remain readable |
| Report schema identifier and compatibility tests | Implemented as `sbom-diff-risk.report.v1` across checked-in full-report fixtures |
| Per-decision rule, evidence, reason, and confidence | Implemented additively in report v1 policy finding objects |
| Component identity canonicalization | Next implementation slice; target semantics are fixed below |

## Component identity target

The canonical identity record will expose these dimensions separately:

- `ecosystem`: trimmed and normalized to a registered ecosystem identifier.
- `package_name`: normalized with ecosystem-aware rules. PyPI names use PEP
503 normalization; other ecosystems require explicit test-backed rules.
- `version`: trimmed but otherwise preserved as observed. The tool will not
infer semantic equivalence between unrelated version schemes.
- `purl`: parsed and normalized when present, while retaining the observed purl
in component evidence for auditability.
- `component_key`: versionless package identity used to align before and after
inputs. A version change remains a change, not an add plus remove.

Identity authority remains `purl`, then `bom_ref`, then the normalized
`(ecosystem, package_name)` coordinate. A parseable purl is authoritative for
its ecosystem and package coordinate. Explicit metadata that disagrees with
that coordinate is a conflict, not an alternative identity.

Within one input:

- two records with the same key and identical normalized metadata fail closed
as `duplicate_component`;
- two records with the same key but different normalized metadata fail closed
as `conflicting_metadata`;
- conflicting ecosystem, package name, or version information between a purl
and explicit fields also fails closed as `conflicting_metadata`;
- metadata differences across the before and after inputs remain normal diff
evidence and do not become same-input conflicts.

The next code slice should introduce a typed canonical identity object and
diagnostic error codes before changing report presentation. Cross-format tests
must cover CycloneDX to SPDX alignment, PyPI name normalization, versioned
purls, exact duplicates, and conflicting metadata.

## Policy and decision contract

`policy_schema` identifies the serialized policy family. The existing integer
`version` continues to select rule capabilities and remains compatible with
v1.0 policy files.

Every emitted policy finding includes:

- `matched_rule_id` for the exact matching rule;
- `exact_evidence` for the local values used in the comparison;
- `decision_reason` as a stable machine-readable reason code;
- `confidence_level` for the recorded evidence source.

The legacy `rule_id`, `policy_rule`, `matched_threshold`, and `observed_value`
fields remain in report schema v1. Consumers can migrate to the grouped fields
without a flag day.

## Compatibility gate

Before v1.1 release:

1. Every checked-in full JSON report must declare
`sbom-diff-risk.report.v1` and satisfy required-field/type tests.
2. Every canonical policy example must declare
`sbom-diff-risk.policy.v1`; legacy omission remains a tested compatibility
path.
3. Unknown policy schema identifiers must fail closed.
4. Policy decisions in local, provenance, and Scorecard fixtures must expose
rule, evidence, reason, and confidence fields.
5. Existing v1.0 fields remain readable for the lifetime of report schema v1.

## Repository boundary

`sbom-diff-and-risk` remains under `tools/` in this monorepo for v1.1. Naming
friction alone is not a migration trigger. A separate repository should be
considered only after real third-party adoption creates an independent release
cadence, issue stream, or packaging boundary. Production PyPI publishing is a
separate decision and is not implied by this route.
Loading