Expand description
JSON Lines + SHA-256 hash-chained provenance log.
Binding spec: docs/PROVENANCE_LOG.md (NORMATIVE, §3 row schema, §4 hash
chain). Failure semantics: fail-closed — callers MUST abort the fetch
if a log write returns Err. See docs/SECURITY.md §1.8 and ADR-0006.
§On-disk format
- JSON Lines (
.jsonl): one JSON object per line, terminated by\n(LF). - UTF-8. Timestamps are RFC3339 in UTC.
- Each row is appended via a single
write_allwhose payload always ends in\n, so a partially-written row is detectable as a missing trailing newline rather than a torn JSON record. - In audit-grade mode (the only mode shipped here), the writer flushes the
BufWriterandfsyncs the file after every row.
§Hash chain (PROVENANCE_LOG.md §4)
Each row carries a prev_hash and a this_hash. The first row’s
prev_hash is the literal string "GENESIS". Every subsequent row’s
prev_hash MUST equal the previous row’s this_hash.
When a log file rotates (§6 — not yet implemented in this crate; see TODO
below), the first row of the NEW log file also uses prev_hash = "GENESIS", restarting the chain.
this_hash is computed as:
this_hash = lower_hex(SHA-256(canonical_json(row \ {this_hash})))where canonical_json is compact JSON (no whitespace) with object keys
sorted lexicographically (PROVENANCE_LOG.md §4). For a row with fields
{ts: "...", ts_seq: 1, event: "fetch", ...}, the canonical bytes begin
with {"capability":... because capability is the lex-first top-level
key. Downstream doiget audit-log --verify (Phase 1+) relies on this
exact rule — do not change the canonicalization without bumping the spec.
§In-process serialization
ProvenanceLog holds a Mutex<LogState>. All append calls within the
same process serialize on this mutex, satisfying the “process-local mutex
on log appender” requirement of docs/SECURITY.md §1.8. Cross-process
coordination (multiple doiget invocations) is out of scope here and
handled by the higher-level flock-based store layer.
§Session id
session_id (PROVENANCE_LOG.md §3) is a 26-char ULID generated once per
process invocation by the caller and stamped into every row written
through the resulting ProvenanceLog. This crate does not generate the
ULID itself — see ProvenanceLog::open for the contract.
§Log rotation and retention (§6)
Implemented (PROVENANCE_LOG.md §6): when access.log exceeds
ROTATE_BYTES (100 MiB) a subsequent ProvenanceLog::append
gzip-compresses the full file to access.log.<YYYY-MM-DD-HHMMSS>.gz,
removes the old access.log, and writes the incoming row as the
first row of a fresh file with prev_hash = "GENESIS" (the hash
chain restarts per segment — segments are NOT linked). Rotation
is fail-closed: any gzip / rename / unlink failure aborts the
append (the caller’s fetch aborts) so the chain never silently
skips. At ProvenanceLog::open, rotated .gz segments older than
the retention window (DOIGET_LOG_RETENTION_DAYS, default 90; 0
disables) are deleted best-effort (a prune failure is logged,
not fatal — pruning is housekeeping, not integrity).
verify_all verifies the current file plus every rotated .gz
segment (each its own GENESIS-rooted chain).
Structs§
- LogRow
- One row of the provenance log (PROVENANCE_LOG.md §3).
- Migration
Report - Summary of a
migrate_v1_to_v2run. - Provenance
Log - Append-only writer with in-process serialization.
- RowInput
- Caller-supplied fields for a row. The writer fills in
ts,ts_seq,session_id,prev_hash,this_hash, and the literalschema_version = "v2"(LOG_SCHEMA_VERSION). - Verify
Issue - A single issue discovered by
verify. - Verify
Report - Outcome of
verify: per-row chain status across the entire log.
Enums§
- Capability
- Capability under which a row was written (PROVENANCE_LOG.md §3).
- LogError
- Errors emitted by the provenance log writer. Callers MUST treat any variant as a fail-closed signal and abort the surrounding fetch.
- LogEvent
- Event class for a log row (PROVENANCE_LOG.md §3).
- LogResult
- Per-row outcome (PROVENANCE_LOG.md §3).
non_exhaustivefor forward compatibility. - Verify
Issue Kind - Classification of a
VerifyIssue.non_exhaustivefor forward compatibility — future kinds may includeSessionIdChange, etc.
Constants§
- LOG_
SCHEMA_ VERSION - Provenance-log row schema version this build writes
(
docs/PROVENANCE_LOG.md§3, ADR-0024).
Functions§
- migrate_
v1_ to_ v2 - Migrate a v1 provenance log to v2 (ADR-0024).
- verify
- Verify the entire log file at
path. - verify_
all - Verify the full provenance history: every rotated
.gzsegment (oldest→newest) followed by the currentaccess.log. Each segment is its own GENESIS-rooted hash chain (segments are deliberately NOT linked across a rotation, PROVENANCE_LOG.md §6), so they are verified independently and reported per-segment.