Security

For vulnerability reporting, see [`../CONTACT.md`](../CONTACT.md). **Do not file a public

Security

Status: NORMATIVE. This document defines binding security contracts and threat surfaces. Implementations and reviewers MUST address each surface before introducing code in the affected area. Changes require a new ADR in DECISIONS/.

For vulnerability reporting, see ../CONTACT.md. Do not file a public issue for security disclosures.


1. Threat surfaces

1.1 Input — DOI / arXiv id strings

Source: CLI argument or MCP tool argument. Trust level: untrusted.

VectorMitigation
Path traversal in DOI suffixStrict regex (^10\.\d{4,9}/[A-Za-z0-9._/():-]+$); safekey algorithm escapes all characters outside [A-Za-z0-9._\-_] (see SAFEKEY.md). : is in-charset (ADR-0026) for legacy Kluwer (10.1023/A:NNNN) and EDP Sciences / Journal de Physique (10.1051/jphys:NNNN) DOIs; it grants no traversal capability (traversal requires composing / and . into ../, and both characters are already in the suffix charset), and safekey escapes it before any filesystem use, so : never reaches a path literally.
Excessively long suffixDOI_SUFFIX_MAX_LEN = 256 chars; longer inputs are rejected with INVALID_REF.
Regex DoSValidation regex is anchored, deterministic, no nested quantifiers.
Log injection (CR / LF / control chars)Provenance log is JSON Lines; all string fields are JSON-escaped, control chars become \uXXXX.

1.2 HTTP responses

Source: publisher / source API. Trust level: partially trusted (TLS-authenticated host, content-typed payload).

VectorMitigation
Oversized PDFStreaming download with body cap (PDF_MAX_BYTES = 100_000_000); writes to a temp file, validated then renamed.
Malformed JSONserde_json strict mode; deserialization errors map to STORE_ERROR or NETWORK_ERROR.
Magic-byte mismatchPDFs are checked for %PDF- header. Files failing this are deleted and the fetch errors.
Slowloris-style stalled responsereqwest per-request timeouts (connect 10s, read 60s, total 300s).

1.3 HTTP redirects

VectorMitigation
Redirect to file://, data:, internalreqwest is configured with redirect policy RedirectPolicy::custom: only https:// redirects allowed.
Redirect to attacker hostPer-source allowlist of redirect target hosts; redirects outside the allowlist abort the fetch. See REDIRECT_ALLOWLIST.md.
Redirect loopredirect_limit = 10.
Open-redirect SSRF chainTool inputs never accept URLs (only DOI / arXiv id). All URLs are constructed from validated source-side templates.

1.4 MCP server inputs

Source: MCP host (LLM agent loop). Trust level: untrusted — even when the host is a trusted application, the agent may relay attacker-controlled paper text or hallucinated identifiers.

VectorMitigation
Hallucinated DOIINVALID_REF returned; never crashes the server.
Prompt injection from paper abstractdoiget never reads paper content; tool inputs are typed, not free text.
fetch_url(url: ...) style abusePermanent non-goal. No tool accepts a URL.
Crafted long ref to overflow logDOI_SUFFIX_MAX_LEN = 256 truncate-or-reject before log.
Per-session fetch floodMAX_CONCURRENT_FETCHES = 5 (process-wide), MCP_BATCH_MAX_SIZE = 100, queue depth MCP_QUEUE_DEPTH_MAX = 100 returns RATE_LIMITED.
Crafted Crossref response that misroutes a fetchAll fetch URLs are constructed from validated identifiers; we trust the source's TLS-authenticated response only for OA URL discovery, then re-validate the resulting URL against the per-source allowlist.

1.5 Filesystem (Store, Cache, Log)

VectorMitigation
Path traversal in safekeysafekey algorithm replaces every character outside [A-Za-z0-9._\-_] with _. Reference test vectors in SAFEKEY.md.
Concurrent writers (BiblioFetch.jl + doiget)flock on <safekey>.toml.lock, 5s timeout. (STORE.md §Contract 2)
Partial write on crashWrite to <safekey>.toml.tmpfsyncrenamefsync parent. (STORE.md §Contract 3)
Log file tamperingSHA-256 hash chain; chattr +a attempted on Linux; doiget audit-log --verify recomputes the chain.
Disk-full DoS via large PDFsPer-fetch size cap; on disk-full the fetch errors and the partial temp file is cleaned up.
Credential file readable to other usersStartup warns if credentials.toml permissions are not 0600 on POSIX.

1.6 Secrets / credentials

VectorMitigation
Bundled API key in binaryBanned by code review; CI greps source for known publisher key formats. No constant string in source matching sk-, Bearer , etc.
Logged in raw formAll credential types are secrecy::Secret<String>; Display and Debug print ****; tracing uses a redactor for known field names.
Leaked via error messageErrors avoid printing source URLs that contain query-param keys (e.g., ?apikey=...).
Persisted in shell historyRecommend ~/.config/doiget/credentials.toml over inline env in shell rc; documented in CONFIG.md.

1.7 PDF content (after fetch)

doiget does not parse PDF content (ADR-0003). Malicious PDFs (embedded JS, exploits) are stored as opaque blobs; their handling is the responsibility of any downstream tool the user pipes the path into.

This is a deliberate design choice. doiget does not implement countermeasures for malicious PDFs because doiget does not interact with their content.

1.8 Concurrent processes (multiple doiget invocations)

VectorMitigation
Race on store writeflock (see 1.5).
Log write interleavingProcess-local mutex on log appender; fsync per write in audit-grade mode.
Cache race~/.cache/doiget/ writes go through atomic rename.

1.9 Supply chain

VectorMitigation
Malicious dependency updatecargo-vet audit chain; cargo-deny allowlist; pinned Cargo.lock.
Hijacked author GitHub account2FA required; verified-signed commits enforced on main; release workflow gated by GitHub Environment with manual approval.
Malicious release artifact swapSigstore keyless signing of release binaries; verifiable with cosign verify-blob.
cargo publish token leakUse crates.io trusted publishing (OIDC) — no long-lived token in repo.
3rd-party Action injectionAll Actions pinned by SHA, not floating tag; Dependabot updates SHAs.
Reproducible buildsCargo.lock committed; rust-toolchain.toml pins rustc; RUSTFLAGS fixed in release-plz.yml.

1.10 Network side channel

doiget cannot prevent third parties (ISP, institution DNS resolver, transit network) from observing the existence of fetches, even with correctly configured TLS:

doiget honors the user's HTTPS_PROXY environment variable; users who require unobservability should configure their network layer (Tor, VPN) externally. doiget does not provide its own proxying or anonymization.

doiget sends a stable User-Agent header per fetch to comply with each source's politeness policy:

User-Agent: doiget/<version> (+https://github.com/sotashimozono/doiget)

1.11 Auto-update / telemetry

doiget contains no auto-update path, no version check, no crash report transmission, and no usage analytics. (ADR-0015) These are denied at the dependency level via cargo-deny to prevent inadvertent introduction.

2. Defense-in-depth controls

The following controls are established:

3. MCP server additional controls

4. Release additional controls

5. Vulnerability disclosure

See ../CONTACT.md §"Security disclosures". Do not file a public issue.

6. Limitations (transparently acknowledged)

doiget cannot defend against:

These are out of scope for doiget's threat model and are noted here to set realistic expectations.


Source: site/content/developer/security.md