Skip to main content

Module provenance

Module provenance 

Source
Expand description

JSON Lines + SHA-256 hash-chained provenance log.

Binding spec: docs/PROVENANCE_LOG.md (NORMATIVE, §3 row schema, §4 hash chain). Failure semantics: fail-closed — callers MUST abort the fetch if a log write returns Err. See docs/SECURITY.md §1.8 and ADR-0006.

§On-disk format

  • JSON Lines (.jsonl): one JSON object per line, terminated by \n (LF).
  • UTF-8. Timestamps are RFC3339 in UTC.
  • Each row is appended via a single write_all whose payload always ends in \n, so a partially-written row is detectable as a missing trailing newline rather than a torn JSON record.
  • In audit-grade mode (the only mode shipped here), the writer flushes the BufWriter and fsyncs the file after every row.

§Hash chain (PROVENANCE_LOG.md §4)

Each row carries a prev_hash and a this_hash. The first row’s prev_hash is the literal string "GENESIS". Every subsequent row’s prev_hash MUST equal the previous row’s this_hash.

When a log file rotates (§6 — not yet implemented in this crate; see TODO below), the first row of the NEW log file also uses prev_hash = "GENESIS", restarting the chain.

this_hash is computed as:

this_hash = lower_hex(SHA-256(canonical_json(row \ {this_hash})))

where canonical_json is compact JSON (no whitespace) with object keys sorted lexicographically (PROVENANCE_LOG.md §4). For a row with fields {ts: "...", ts_seq: 1, event: "fetch", ...}, the canonical bytes begin with {"capability":... because capability is the lex-first top-level key. Downstream doiget audit-log --verify (Phase 1+) relies on this exact rule — do not change the canonicalization without bumping the spec.

§In-process serialization

ProvenanceLog holds a Mutex<LogState>. All append calls within the same process serialize on this mutex, satisfying the “process-local mutex on log appender” requirement of docs/SECURITY.md §1.8. Cross-process coordination (multiple doiget invocations) is out of scope here and handled by the higher-level flock-based store layer.

§Session id

session_id (PROVENANCE_LOG.md §3) is a 26-char ULID generated once per process invocation by the caller and stamped into every row written through the resulting ProvenanceLog. This crate does not generate the ULID itself — see ProvenanceLog::open for the contract.

§Log rotation and retention (§6)

Implemented (PROVENANCE_LOG.md §6): when access.log exceeds ROTATE_BYTES (100 MiB) a subsequent ProvenanceLog::append gzip-compresses the full file to access.log.<YYYY-MM-DD-HHMMSS>.gz, removes the old access.log, and writes the incoming row as the first row of a fresh file with prev_hash = "GENESIS" (the hash chain restarts per segment — segments are NOT linked). Rotation is fail-closed: any gzip / rename / unlink failure aborts the append (the caller’s fetch aborts) so the chain never silently skips. At ProvenanceLog::open, rotated .gz segments older than the retention window (DOIGET_LOG_RETENTION_DAYS, default 90; 0 disables) are deleted best-effort (a prune failure is logged, not fatal — pruning is housekeeping, not integrity). verify_all verifies the current file plus every rotated .gz segment (each its own GENESIS-rooted chain).

Structs§

LogRow
One row of the provenance log (PROVENANCE_LOG.md §3).
MigrationReport
Summary of a migrate_v1_to_v2 run.
ProvenanceLog
Append-only writer with in-process serialization.
RowInput
Caller-supplied fields for a row. The writer fills in ts, ts_seq, session_id, prev_hash, this_hash, and the literal schema_version = "v2" (LOG_SCHEMA_VERSION).
VerifyIssue
A single issue discovered by verify.
VerifyReport
Outcome of verify: per-row chain status across the entire log.

Enums§

Capability
Capability under which a row was written (PROVENANCE_LOG.md §3).
LogError
Errors emitted by the provenance log writer. Callers MUST treat any variant as a fail-closed signal and abort the surrounding fetch.
LogEvent
Event class for a log row (PROVENANCE_LOG.md §3).
LogResult
Per-row outcome (PROVENANCE_LOG.md §3). non_exhaustive for forward compatibility.
VerifyIssueKind
Classification of a VerifyIssue. non_exhaustive for forward compatibility — future kinds may include SessionIdChange, etc.

Constants§

LOG_SCHEMA_VERSION
Provenance-log row schema version this build writes (docs/PROVENANCE_LOG.md §3, ADR-0024).

Functions§

migrate_v1_to_v2
Migrate a v1 provenance log to v2 (ADR-0024).
verify
Verify the entire log file at path.
verify_all
Verify the full provenance history: every rotated .gz segment (oldest→newest) followed by the current access.log. Each segment is its own GENESIS-rooted hash chain (segments are deliberately NOT linked across a rotation, PROVENANCE_LOG.md §6), so they are verified independently and reported per-segment.