Skip to content

Latest commit

 

History

History
137 lines (99 loc) · 8.86 KB

File metadata and controls

137 lines (99 loc) · 8.86 KB

Parser contract

LogLens treats parser behavior as reviewable output, not as a hidden implementation detail. A line is either recognized as a typed event, skipped as blank input, or surfaced as a warning with coverage telemetry.

The guiding rule is:

Parser observability > silent detection claims.

For the v0.5 Evidence Explainability Release, this contract is part of the release-facing reviewer path. Start with docs/release-v0.5.0.md for the release checklist, then use this page to inspect supported inputs, unsupported-line handling, parser warning categories, and the detection signal boundary.

Supported input modes

Mode Typical source Timestamp behavior Review anchor
syslog_legacy auth.log / secure style lines such as Mar 10 08:11:22 example-host sshd[1234]: ... Requires an explicit four-digit year from --year or timestamp.assume_year assets/parser_fixture_matrix_syslog.log
journalctl_short_full journalctl --output=short-full style lines such as Tue 2026-03-10 08:11:22 UTC example-host sshd[1234]: ... Uses the embedded year and supported timezone token assets/parser_fixture_matrix_journalctl_short_full.log

Supported timezone tokens for journalctl_short_full are intentionally narrow: UTC, GMT, Z, and numeric offsets such as +0000 or +00:00.

Recognized event families

The parser currently recognizes common authentication evidence from:

  • sshd
  • sudo
  • su
  • login
  • pam_unix(...)
  • selected pam_faillock(...) variants
  • selected pam_sss(...) variants

Recognized SSH failure families include failed password, invalid user, illegal user, failed publickey, failed keyboard-interactive/pam, failed-none invalid-user probing, input_userauth_request invalid/illegal-user preauth traces, sshd-owned PAM authentication-failure lines, and maximum-authentication-attempts-exceeded lines. illegal user is treated as an OpenSSH wording variant of invalid user. Maximum-authentication-attempts and sshd-owned PAM authentication-failure lines may include OpenSSH's leading error: marker and still normalize into the same event family. Invalid or illegal-user variants of failed-none probing, input_userauth_request preauth traces, keyboard-interactive, sshd-owned PAM authentication failures, and maximum-authentication-attempts-exceeded lines are normalized into ssh_invalid_user events. Recognized SSH failures can become detection signals through the configured signal mapping.

Recognized success or audit families include accepted password, accepted publickey, accepted keyboard-interactive/pam, sudo command audit lines, sudo password failures, sudoers policy denials, su success/failure audit lines, selected util-linux login failures and session records, and selected PAM session/auth lines. login failures do not infer a network source IP and remain lower-confidence pam_auth_failure context.

The selected login wording is anchored to the upstream login.c syslog strings. Localized or otherwise unmodeled variants remain visible as login_other warnings.

Line handling contract

Input line outcome Parser behavior Report behavior
Recognized auth line Emits a typed Event with timestamp, hostname, program, optional pid, message, source IP, username, event type, and line number Can contribute to summaries, reports, and configured detection signals
Blank line Skips the line and increments skipped_blank_lines Does not become a warning or parsed event
Malformed header Emits a parser warning with the original line number, structural reason, and unknown_timestamp category Counts toward unparsed_lines, failure_categories, and top_unknown_patterns
Well-formed but unsupported auth pattern Emits a parser warning with a failure category and unknown-pattern bucket Stays visible as telemetry instead of being silently ignored

This is the main trust boundary: unsupported input should remain inspectable, even when it does not produce a finding.

Internal parsing pipeline

AuthLogParser remains the single public parser interface. Its implementation is split into internal modules with one-way flow:

  1. timestamp_parser parses the selected input mode's timestamp and hostname.
  2. source_envelope_parser extracts the program tag, optional pid, and raw message.
  3. program_dispatch selects a registered sshd, PAM, sudo, or su handler.
  4. The selected handler either emits a normalized event or returns a structured failure result.
  5. failure_classifier assigns malformed-source and unsupported-pattern telemetry without turning unsupported evidence into an event.

All registered program handlers return the same internal result shape: matched, optional event, failure_category, and reason. The registry and handler modules are implementation details; callers continue to depend only on AuthLogParser, Event, and the parser telemetry contract.

Parser failure categories are intentionally coarser than unknown-pattern buckets:

  • unknown_timestamp
  • unknown_program
  • known_program_unknown_message
  • malformed_source_ip
  • unsupported_pam_variant

Stable unsupported-pattern buckets currently exercised by the fixture corpus include sshd_connection_closed_preauth, sshd_timeout_or_disconnection, sshd_negotiation_failure, pam_faillock_account_locked, and pam_unix_session_closed. They are parser telemetry and warnings only; detector signal mappings decide which parsed events can contribute to findings.

Detection signal boundary

Parsing a line does not automatically mean it should drive a detector. LogLens keeps that boundary explicit through AuthSignalConfig.

Default terminal SSH failure evidence:

  • ssh_failed_password
  • ssh_invalid_user
  • ssh_failed_publickey
  • ssh_failed_keyboard_interactive
  • ssh_max_auth_tries

Default lower-confidence attempt evidence:

  • pam_auth_failure, which is attempt evidence but not terminal failure evidence unless configured otherwise

Default sudo burst evidence:

  • sudo_command

Parsed successes and audit-only events remain reportable but do not count as brute-force or multi-user failure evidence by default.

Test corpus map

Artifact What it proves
tests/test_parser.cpp Unit-level parser expectations, malformed-line behavior, mode aliases, fixture-matrix counts, and unknown-pattern buckets
tests/test_parser_properties.cpp Deterministic generated checks for handler-registry order independence, malformed source tokens, failure taxonomy stability, and arbitrary-byte result invariants
tests/fuzz/README.md and tests/fuzz/corpus/parser Optional Clang libFuzzer harness instructions plus a sanitized parser seed corpus used by the bounded CI fuzz smoke campaign
tests/test_detector.cpp Detection signal mapping and default counting behavior after parsing
assets/parser_fixture_matrix_syslog.log Syslog known/unknown parser matrix
assets/parser_fixture_matrix_journalctl_short_full.log Journalctl short-full known/unknown parser matrix
assets/parser_auth_families_syslog.log Syslog PAM/auth-family parser coverage
assets/parser_auth_families_journalctl_short_full.log Journalctl PAM/auth-family parser coverage
assets/noisy_auth_sample.log and tests/fixtures/parser_matrix/noisy_auth_expected.json Noisy syslog parser-coverage matrix for malformed, unsupported, blank, irrelevant, multi-host, and unusual-username input
assets/mixed_auth_corpus.log and assets/mixed_auth_parser_coverage.json 160-line mixed auth corpus plus reviewer-facing parser coverage artifact for dirty syslog input and selected util-linux login evidence
tests/test_report_contracts.cpp Stable report-shape expectations for generated artifacts

Non-goals

The parser does not try to:

  • infer missing syslog years
  • support every Linux authentication log variant
  • classify unsupported lines as findings
  • correlate across files or hosts
  • produce incident verdicts

Those boundaries are intentional for the MVP. The current priority is to keep parser coverage explicit and safely extensible.