Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,23 @@ All notable user-visible changes should be recorded here.

### Added

- None yet.
- Added stable JSON finding identity fields: `finding_id` and
`episode_index`.
- Added a separated-burst syslog report-contract fixture where one source IP
produces two time-separated brute-force findings.
- Added detector regression coverage for stable episode identity under unsorted
input order and inclusive window-boundary behavior.
- Added parser regression coverage for malformed source-IP token
classification.

### Changed

- Detector rules now emit separate findings for time-separated detection
episodes within the same rule subject instead of collapsing each subject to a
single best window.
- Bumped the JSON report artifact contract from `loglens.report.v2` /
`schema_version` 2 to `loglens.report.v3` / `schema_version` 3 for finding
identity fields.

### Fixed

Expand All @@ -22,6 +32,8 @@ All notable user-visible changes should be recorded here.

- Documented detection episode semantics in the rule catalog and report artifact
contract notes.
- Added the v0.6 Detection Episode Semantics release note and schema v2 to v3
migration guidance.

## v0.5.0

Expand Down
11 changes: 7 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@ or attribution:

```json
{
"finding_id": "finding:brute_force:4e6aec401a0d45ca",
"rule_id": "brute_force",
"episode_index": 1,
"subject_kind": "source_ip",
"subject": "203.0.113.10",
"grouping_key": "source_ip",
Expand All @@ -31,7 +33,7 @@ or attribution:

**Release posture:** Early reviewer-stable release with a narrow Linux authentication evidence contract. Parser and detection coverage remain intentionally narrow.

Reviewing the project quickly? Start with [`docs/reviewer-path.md`](./docs/reviewer-path.md), [`docs/reviewer-brief.md`](./docs/reviewer-brief.md), and the [`v0.5 Evidence Explainability release note`](./docs/release-v0.5.0.md). The [`quality gates map`](./docs/quality-gates.md) links claims to tests and fixtures. For detection reasoning, follow the [`one-page incident-style case`](./docs/incident-style-case.md), then use the full [`Linux auth brute-force case study`](./docs/case-study-linux-auth-bruteforce.md), [`rule catalog`](./docs/rule-catalog.md), and [`false-positive taxonomy`](./docs/false-positive-taxonomy.md) for depth. For local scale expectations, see the [`performance envelope`](./docs/performance-envelope.md).
Reviewing the project quickly? Start with [`docs/reviewer-path.md`](./docs/reviewer-path.md), [`docs/reviewer-brief.md`](./docs/reviewer-brief.md), the [`v0.5 Evidence Explainability release note`](./docs/release-v0.5.0.md), and the [`v0.6 Detection Episode Semantics release note`](./docs/release-v0.6.0.md). The [`quality gates map`](./docs/quality-gates.md) links claims to tests and fixtures. For detection reasoning, follow the [`one-page incident-style case`](./docs/incident-style-case.md), then use the full [`Linux auth brute-force case study`](./docs/case-study-linux-auth-bruteforce.md), [`rule catalog`](./docs/rule-catalog.md), and [`false-positive taxonomy`](./docs/false-positive-taxonomy.md) for depth. For local scale expectations, see the [`performance envelope`](./docs/performance-envelope.md).

For a shorter external review entry point focused on uncertainty handling, read
[How LogLens Treats Parser Uncertainty as Evidence](./docs/case-study-parser-uncertainty-as-evidence.md).
Expand Down Expand Up @@ -62,7 +64,7 @@ LogLens includes two minimal GitHub Actions workflows:
- `CI` builds and tests the project on `ubuntu-latest` and `windows-latest`
- `CodeQL` runs GitHub code scanning for C/C++ on pushes, pull requests, and a weekly schedule

Both workflows are intended to stay stable enough to require on pull requests to `main`. Regression coverage is backed by sanitized parser fixture matrices plus golden report-contract fixtures for `report.md`, `report.json`, and optional CSV outputs. Release-facing documentation is split across [`CHANGELOG.md`](./CHANGELOG.md), [`docs/release-process.md`](./docs/release-process.md), [`docs/release-v0.1.0.md`](./docs/release-v0.1.0.md), [`docs/release-v0.3.0.md`](./docs/release-v0.3.0.md), [`docs/release-v0.5.0.md`](./docs/release-v0.5.0.md), and the repository's GitHub release notes. The repository hardening note is in [`docs/repo-hardening.md`](./docs/repo-hardening.md), and vulnerability reporting guidance is in [`SECURITY.md`](./SECURITY.md).
Both workflows are intended to stay stable enough to require on pull requests to `main`. Regression coverage is backed by sanitized parser fixture matrices plus golden report-contract fixtures for `report.md`, `report.json`, and optional CSV outputs. Release-facing documentation is split across [`CHANGELOG.md`](./CHANGELOG.md), [`docs/release-process.md`](./docs/release-process.md), [`docs/release-v0.1.0.md`](./docs/release-v0.1.0.md), [`docs/release-v0.3.0.md`](./docs/release-v0.3.0.md), [`docs/release-v0.5.0.md`](./docs/release-v0.5.0.md), [`docs/release-v0.6.0.md`](./docs/release-v0.6.0.md), and the repository's GitHub release notes. The repository hardening note is in [`docs/repo-hardening.md`](./docs/repo-hardening.md), and vulnerability reporting guidance is in [`SECURITY.md`](./SECURITY.md).

## Threat Model

Expand All @@ -86,8 +88,9 @@ LogLens currently detects:

Each rule can emit multiple findings for the same subject when matching
evidence appears in time-separated detector episodes. Report consumers should
use `window_start`, `window_end`, and `evidence_event_ids` rather than assuming
one finding per `rule_id` and subject.
use `finding_id`, `episode_index`, `window_start`, `window_end`, and
`evidence_event_ids` rather than assuming one finding per `rule_id` and
subject.

LogLens currently parses and reports these additional auth patterns beyond the core detector inputs:

Expand Down
5 changes: 4 additions & 1 deletion docs/quality-gates.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ The main review principle is:
| Parser coverage is visible | [`parser-coverage-notes.md`](./parser-coverage-notes.md), [`tests/fixtures/parser_matrix/noisy_auth_expected.json`](../tests/fixtures/parser_matrix/noisy_auth_expected.json) | `test_parser` compares noisy-auth coverage output to the checked-in expected summary | Reviewer can see parsed lines, skipped blanks, warnings, failure categories, and unknown-pattern buckets |
| Unsupported evidence does not silently become detector evidence | [`parser-contract.md`](./parser-contract.md), [`rule-catalog.md`](./rule-catalog.md), [`case-study-linux-auth-bruteforce.md`](./case-study-linux-auth-bruteforce.md) | `test_parser` covers unknown-pattern warnings; `test_detector` covers signal-boundary behavior | Reviewer can explain why unsupported lines remain warnings instead of findings |
| Report artifacts are deterministic | [`report-artifacts.md`](./report-artifacts.md), report-contract fixtures under [`tests/fixtures/report_contracts`](../tests/fixtures/report_contracts) | `test_report_contracts` compares generated `report.md`, `report.json`, `findings.csv`, and `warnings.csv` against golden fixtures | Reviewer can regenerate reports and see schema or text changes as explicit snapshot diffs |
| Findings are explainable | [`rule-catalog.md`](./rule-catalog.md), [`report-artifacts.md`](./report-artifacts.md) | `test_report` checks JSON finding fields; report-contract fixtures lock `rule_id`, `window_start`, `window_end`, `threshold`, `observed_count`, `grouping_key`, `evidence_event_ids`, and `verdict_boundary` | Reviewer can trace a finding from rule context back to source line IDs and see the non-verdict boundary |
| Findings are explainable | [`rule-catalog.md`](./rule-catalog.md), [`report-artifacts.md`](./report-artifacts.md) | `test_report` checks JSON finding fields; report-contract fixtures lock `finding_id`, `episode_index`, `rule_id`, `window_start`, `window_end`, `threshold`, `observed_count`, `grouping_key`, `evidence_event_ids`, and `verdict_boundary` | Reviewer can trace a finding from rule context back to source line IDs and see the non-verdict boundary |
| Detection episodes are explicit | [`release-v0.6.0.md`](./release-v0.6.0.md), [`rule-catalog.md`](./rule-catalog.md#detection-episode-semantics), [`separated_bursts_syslog`](../tests/fixtures/report_contracts/separated_bursts_syslog) | `test_detector` covers separated episodes, stable identity under unsorted input order, and inclusive boundary windows; `test_report_contracts` locks the two-finding separated-burst report | Reviewer can see one rule and subject emit two non-overlapping findings without treating either as an incident verdict |
| False-positive boundaries are visible | [`rule-catalog.md`](./rule-catalog.md), [`case-study-linux-auth-bruteforce.md`](./case-study-linux-auth-bruteforce.md) | Documentation review gate; detector tests ensure unsupported evidence does not inflate counts | Reviewer can state NAT, internal scanner, lab replay, shared bastion, scheduled admin task, and malformed replay boundaries |
| Parser failure taxonomy is exposed | [`parser-contract.md`](./parser-contract.md), [`parser-conformance-matrix.md`](./parser-conformance-matrix.md), [`report-artifacts.md`](./report-artifacts.md) | `test_parser`, `test_report`, `test_cli`, and `test_report_contracts` cover `failure_categories` and warning `category` output | Reviewer can distinguish timestamp, program, known-program unknown-message, malformed-source-IP, and unsupported-PAM failures |
| Local scale expectations are reproducible | [`performance-envelope.md`](./performance-envelope.md), [`scripts/benchmark-performance-envelope.ps1`](../scripts/benchmark-performance-envelope.ps1) | `pwsh -File scripts/benchmark-performance-envelope.ps1` regenerates sanitized benchmark inputs and local summary artifacts | Reviewer can reproduce the 1k/10k/100k-line envelope and understand its caveats |
Expand All @@ -41,6 +42,8 @@ update the matching evidence surface in the same pull request:
- parser behavior change: update parser tests, fixture matrices, and parser docs
- report shape change: update report-contract fixtures and report artifact docs
- rule behavior change: update detector tests, rule catalog, and case-study text when relevant
- episode semantics change: update detector tests, separated-burst report
fixtures, schema migration notes, and rule catalog policy text
- warning taxonomy change: update parser failure taxonomy docs and warning snapshots
- performance-envelope change: rerun the benchmark harness and record the platform/result source

Expand Down
112 changes: 112 additions & 0 deletions docs/release-v0.6.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# LogLens v0.6.0 - Detection Episode Semantics

Theme: Detection Episode Semantics.

This release note describes the v0.6 report and detector contract. It does not
add new detection rules. It makes repeated time-separated findings for the same
rule subject explicit and reviewable.

## What Changed

- A single `rule_id`, `subject_kind`, and `subject` can emit multiple
non-overlapping findings when evidence appears in separated detector
episodes.
- JSON reports now use `schema: loglens.report.v3` and `schema_version: 3`.
- JSON findings include stable finding identity fields:
- `finding_id`
- `episode_index`
- The separated-burst contract fixture demonstrates one source IP producing two
distinct brute-force findings in one report.

## Stable JSON Contract

`loglens.report.v3` keeps the v0.5 explainability fields and adds:

| Field | Meaning |
| --- | --- |
| `finding_id` | Deterministic report-local identifier for the selected finding, derived from the rule, subject, selected window, counts, and evidence event IDs. |
| `episode_index` | 1-based sequence number within the same `rule_id`, `subject_kind`, and `subject`. |

Existing v2 finding fields remain part of the stable explainability surface:

- `rule_id`
- `subject_kind`
- `subject`
- `grouping_key`
- `window_start`
- `window_end`
- `threshold`
- `observed_count`
- `evidence_event_ids`
- `verdict_boundary`

The optional CSV contract is unchanged in v0.6.

## Episode Policy

LogLens v0.6 uses cooldown-separated maximal-window episodes:

| Policy point | v0.6 behavior |
| --- | --- |
| First threshold crossing | Used to decide that an episode candidate is eligible to emit a finding. It is not necessarily the reported window. |
| Maximal window | The reported window is the highest-signal sliding window within the episode candidate. |
| Non-overlapping windows | One rule and subject can emit multiple findings, but selected episode candidates do not reuse the same matching signals. |
| Cooldown merge | Signals separated by an idle gap less than or equal to the rule window stay in the same episode candidate. A larger gap starts a new candidate. |

Episode splitting is a reporting model. It is not an incident boundary.

## Separated-Burst Fixture

The fixture
[`tests/fixtures/report_contracts/separated_bursts_syslog/input.log`](../tests/fixtures/report_contracts/separated_bursts_syslog/input.log)
contains one sanitized source IP with:

- five failed SSH attempts from `09:00:00` through `09:04:00`
- five failed SSH attempts from `15:00:00` through `15:04:00`

The expected
[`report.json`](../tests/fixtures/report_contracts/separated_bursts_syslog/report.json)
contains two `brute_force` findings for the same subject:

- `episode_index: 1`, window `2026-03-10 09:00:00` to
`2026-03-10 09:04:00`
- `episode_index: 2`, window `2026-03-10 15:00:00` to
`2026-03-10 15:04:00`

This fixture locks the main v0.6 behavior: repeated separated bursts are no
longer collapsed to one best window.

## Schema v2 to v3 Migration

Consumers should treat `schema` and `schema_version` as the report shape gate:

- v2: `loglens.report.v2`, `schema_version: 2`
- v3: `loglens.report.v3`, `schema_version: 3`

Consumers that keyed findings by `rule_id` and `subject` should migrate to
`finding_id`, or include `episode_index`, `window_start`, `window_end`, and
`evidence_event_ids` in their own composite key.

## Validation Surface

v0.6 is covered by:

- detector tests for separated brute-force, multi-user probing, and sudo-burst
episodes
- detector tests for stable episode identity under unsorted input order
- detector tests for inclusive rule-window boundaries
- parser tests for malformed source-IP token classification
- report tests for `finding_id`, `episode_index`, and schema v3 output
- golden report-contract fixtures for Markdown, JSON, and optional CSV reports

## Non-Claims

LogLens v0.6 findings remain bounded triage signals. The release preserves
these explicit non-claims:

- no compromise verdict
- no attribution
- no blocking recommendation
- no cross-host correlation

Findings remain bounded triage signals over normalized local evidence.
36 changes: 31 additions & 5 deletions docs/report-artifacts.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,13 @@ The JSON report keeps parser observability visible next to findings:
- `findings`
- `warnings`

Finding objects contain `rule_id`, `rule`, `subject_kind`, `subject`, `grouping_key`, `threshold`, `observed_count`, `event_count`, `window_start`, `window_end`, `evidence_event_ids`, `verdict_boundary`, `usernames`, and `summary`.
Finding objects contain `finding_id`, `rule_id`, `rule`, `episode_index`, `subject_kind`, `subject`, `grouping_key`, `threshold`, `observed_count`, `event_count`, `window_start`, `window_end`, `evidence_event_ids`, `verdict_boundary`, `usernames`, and `summary`.

The stable finding explainability surface for `loglens.report.v2` is:
The stable finding explainability surface for `loglens.report.v3` is:

- `finding_id`
- `rule_id`
- `episode_index`
- `subject_kind`
- `subject`
- `grouping_key`
Expand All @@ -61,10 +63,20 @@ fixtures explicitly.

`evidence_event_ids` are deterministic local event identifiers derived from the source line number, formatted as `line:<number>`. They let reviewers trace a finding back to the normalized input events that satisfied the rule window without implying global event identity.

`finding_id` is a deterministic report-local finding identifier derived from
the rule, subject, selected window, counts, and evidence event IDs. It is
stable for the same normalized evidence and rule output, but it is not a global
case identifier.

`episode_index` is a 1-based sequence number within one `rule_id`,
`subject_kind`, and `subject`. It is meant for reviewer navigation when a rule
emits more than one finding for the same subject.

Consumers should not assume that `rule_id` plus `subject` is unique within a
report. A rule can emit multiple findings for the same subject when matching
evidence appears in time-separated detector episodes. Use `window_start`,
`window_end`, and `evidence_event_ids` to distinguish episode-level findings.
evidence appears in time-separated detector episodes. Use `finding_id`,
`episode_index`, `window_start`, `window_end`, and `evidence_event_ids` to
distinguish episode-level findings.

`verdict_boundary` is a stable token that states what the finding must not be
read as. It keeps machine-readable findings aligned with LogLens's triage
Expand All @@ -79,7 +91,20 @@ Warning objects contain the original `line_number`, parser `category`, and parse
`schema` and `schema_version` identify the report artifact contract, not the
application release. They are intended for downstream tooling that needs a
stable way to reject incompatible report shapes. The current JSON contract is
`loglens.report.v2` with `schema_version` set to `2`.
`loglens.report.v3` with `schema_version` set to `3`.

### Schema v2 to v3 Migration

`loglens.report.v3` keeps the v2 finding explainability fields and adds:

- `finding_id`
- `episode_index`

Downstream consumers should treat `schema` and `schema_version` as the report
shape gate. Consumers that keyed findings by `rule_id` and `subject` should
move to `finding_id`, or include `episode_index`, `window_start`, `window_end`,
and `evidence_event_ids` in their own composite key. The optional CSV contract
is unchanged in v3.

Parser failure categories are stable reviewer-facing buckets for unsupported
lines: `unknown_timestamp`, `unknown_program`,
Expand Down Expand Up @@ -111,6 +136,7 @@ The report contracts are backed by generated fixture artifacts:
| [`journalctl_short_full`](../tests/fixtures/report_contracts/journalctl_short_full) | `report.md`, `report.json`, `findings.csv`, `warnings.csv` |
| [`multi_host_syslog_legacy`](../tests/fixtures/report_contracts/multi_host_syslog_legacy) | `report.md`, `report.json`, `findings.csv`, `warnings.csv` |
| [`multi_host_journalctl_short_full`](../tests/fixtures/report_contracts/multi_host_journalctl_short_full) | `report.md`, `report.json`, `findings.csv`, `warnings.csv` |
| [`separated_bursts_syslog`](../tests/fixtures/report_contracts/separated_bursts_syslog) | `report.md`, `report.json`, `findings.csv`, `warnings.csv` |

The enforcement lives in [`tests/test_report_contracts.cpp`](../tests/test_report_contracts.cpp). Parser or rule changes that alter report artifacts must update these snapshots explicitly. This includes changes to stable finding explainability fields, parser coverage fields, warning categories, CSV columns, or Markdown report layout. The focused report writer tests live in [`tests/test_report.cpp`](../tests/test_report.cpp).

Expand Down
Loading
Loading