Capability profile

```rust

CapabilityProfile

Status: NORMATIVE. Defines the runtime capability gate that authorizes which sources may be invoked. Every Source::fetch implementation MUST require a &CapabilityProfile argument; sources whose capability is not granted at startup cannot be invoked at the type level.

1. Type definition

// These types are defined in the doiget-core crate.

use secrecy::SecretString; // = SecretBox<str>; the `secrecy` 0.10 owned-string secret
use chrono::{DateTime, Utc};

// All structs below are #[non_exhaustive] in the Rust source. External crates
// cannot construct them via struct-literal syntax — go through
// `CapabilityProfile::from_env()` (see §2). `TdmGrant::api_key` exists only
// when at least one `tdm-*` Cargo feature is compiled in (the `secrecy` dep is
// `optional = true` and gated on those features per ADR-0002). The field is
// additive under `#[non_exhaustive]`; default release binaries — which contain
// no TDM code at all — do not carry it.

#[derive(Debug, Clone)]
#[non_exhaustive]
pub struct CapabilityProfile {
    pub oa: AlwaysOn,
    pub metadata: MetadataAccess,
    pub tdm_elsevier: Option<TdmGrant>,
    pub tdm_aps: Option<TdmGrant>,
    pub tdm_springer: Option<TdmGrant>,
    pub rate_limits: RateLimits,
}

#[derive(Debug, Clone, Copy)]
pub struct AlwaysOn;   // unit struct — Tier 1 OA is always permitted

#[derive(Debug, Clone, Default)]
#[non_exhaustive]
pub struct MetadataAccess {
    pub openalex: bool,
    pub semantic_scholar: bool,
    pub doaj: bool,
}

#[derive(Debug, Clone)]
#[non_exhaustive]
pub struct TdmGrant {
    // Present only under a `tdm-*` feature (see the note above). `secrecy`
    // 0.10 replaced `Secret<String>` with `SecretString` (= `SecretBox<str>`).
    #[cfg(any(feature = "tdm-elsevier", feature = "tdm-aps", feature = "tdm-springer"))]
    pub api_key:       SecretString,
    pub agreed_at:     DateTime<Utc>,
    pub agree_env_var: String,            // e.g. "DOIGET_AGREE_TDM_ELSEVIER"
}

#[derive(Debug, Clone, Copy)]
#[non_exhaustive]
pub struct RateLimits {
    pub(crate) max_concurrent_fetches: u32, // hard-coded 5 (LEGAL §6 safeguard 8)
    pub(crate) max_fetches_per_second: f32, // hard-coded 5.0
    pub(crate) per_source_backoff_ms:  u64, // hard-coded 200
}

impl RateLimits {
    /// Sole public constructor. There is no other way to obtain a
    /// `RateLimits` outside of `doiget-core`: fields are `pub(crate)`, the
    /// struct is `#[non_exhaustive]`, and no public `new`-style function
    /// exists. This closes the legal-safeguard loophole that bare `pub`
    /// fields would create (cf. `docs/LEGAL.md` §6 safeguard 8).
    pub const HARD_CODED: Self = Self {
        max_concurrent_fetches: 5,
        max_fetches_per_second: 5.0,
        per_source_backoff_ms:  200,
    };

    pub const fn max_concurrent_fetches(&self) -> u32 { self.max_concurrent_fetches }
    pub const fn max_fetches_per_second(&self) -> f32 { self.max_fetches_per_second }
    pub const fn per_source_backoff_ms(&self)  -> u64 { self.per_source_backoff_ms }
}

External construction

External crates always go through:

let profile = CapabilityProfile::from_env()?;

Struct-literal construction (CapabilityProfile { oa: ..., ... }) is blocked outside doiget-core by #[non_exhaustive]. Tests inside doiget-core may still construct profiles directly for fixture purposes.

api_key is wrapped in secrecy::SecretString (the secrecy 0.10 replacement for the 0.9 Secret<String>) so that Debug prints a redaction placeholder rather than the key. Logs additionally use a redactor for known sensitive field names, and any URL that carries a key as a query parameter (Springer Nature — see §3) is passed through redact_api_key_in_url before it is logged or recorded in provenance.

2. Resolution algorithm

impl CapabilityProfile {
    pub fn from_env() -> Result<Self, CapabilityError> {
        Ok(Self {
            oa: AlwaysOn,
            metadata: MetadataAccess {
                openalex:        env::var("DOIGET_ENABLE_OPENALEX").is_ok(),
                semantic_scholar: env::var("DOIGET_ENABLE_S2").is_ok(),
                doaj:            env::var("DOIGET_ENABLE_DOAJ").is_ok(),
            },
            tdm_elsevier: read_tdm_grant("DOIGET_AGREE_TDM_ELSEVIER", "DOIGET_KEY_ELSEVIER")?,
            tdm_aps:      read_tdm_grant("DOIGET_AGREE_TDM_APS",      "DOIGET_KEY_APS")?,
            tdm_springer: read_tdm_grant("DOIGET_AGREE_TDM_SPRINGER", "DOIGET_KEY_SPRINGER")?,
            rate_limits:  RateLimits::HARD_CODED,
        })
    }
}

fn read_tdm_grant(agree_var: &str, key_var: &str) -> Result<Option<TdmGrant>, CapabilityError> {
    let agreed = matches!(env::var(agree_var).as_deref(), Ok("1"));
    let key    = env::var(key_var).ok();
    match (agreed, key) {
        (true, Some(k)) => Ok(Some(TdmGrant {
            // `secrecy` 0.10: `SecretString::from(String)` replaces the
            // 0.9 `Secret::new(k)`. Field present only under a `tdm-*`
            // feature; the actual source splits this into a small
            // `build_tdm_grant` helper so the cfg lives in one place.
            api_key:       SecretString::from(k),
            agreed_at:     Utc::now(),
            agree_env_var: agree_var.to_string(),
        })),
        (true, None)    => Err(CapabilityError::AgreedButNoKey {
            agree_var: agree_var.into(), key_var: key_var.into(),
        }),
        (false, Some(_)) => Err(CapabilityError::KeyButNotAgreed {
            agree_var: agree_var.into(),
        }),
        (false, None)   => Ok(None),
    }
}

Three resolution rules

  1. agree=1 + key presentSome(TdmGrant). The source is enabled this session.
  2. agree=1 but key missingErr(AgreedButNoKey). Startup fails; user has agreed but provided no credential. Silent skip would mask a misconfiguration.
  3. agree unset but key presentErr(KeyButNotAgreed). Startup fails; we require the explicit agreement env var even when the key is set. Otherwise a leaked DOIGET_KEY_ELSEVIER from a parent shell environment could enable a source the user did not intend.

3. Environment variable reference

VariableTypeEffect
DOIGET_ENABLE_OPENALEXpresenceEnables OpenAlex (metadata only).
DOIGET_ENABLE_S2presenceEnables Semantic Scholar.
DOIGET_ENABLE_DOAJpresenceEnables DOAJ.
DOIGET_AGREE_TDM_ELSEVIER=1Acknowledges Elsevier TDM ToS. Pairs with key.
DOIGET_KEY_ELSEVIERsecret stringElsevier API key. Read into Secret<String>.
DOIGET_AGREE_TDM_APS=1Acknowledges APS Harvest TDM ToS.
DOIGET_KEY_APSsecret stringAPS API key.
DOIGET_AGREE_TDM_SPRINGER=1Acknowledges Springer Nature OA ToS.
DOIGET_KEY_SPRINGERsecret stringSpringer API key.

Setting DOIGET_AGREE_TDM_* only makes the relevant source eligible. The corresponding TDM-specific Cargo feature must also have been compiled in (cargo build --features tdm-elsevier etc.). Default release binaries do not contain TDM source code at all.

4. Source trait integration

pub trait Source: Send + Sync {
    fn name(&self) -> &str;
    fn can_serve(&self, profile: &CapabilityProfile, ref_: &Ref) -> bool;
    async fn fetch(&self, ref_: &Ref, profile: &CapabilityProfile, ctx: &FetchContext)
        -> Result<FetchResult, FetchError>;
}

// Example: Elsevier TDM source (only compiled when feature = "tdm-elsevier")
#[cfg(feature = "tdm-elsevier")]
impl Source for ElsevierTdm {
    fn can_serve(&self, p: &CapabilityProfile, _: &Ref) -> bool {
        p.tdm_elsevier.is_some()
    }
    async fn fetch(&self, ref_: &Ref, profile: &CapabilityProfile, ctx: &FetchContext)
        -> Result<FetchResult, FetchError>
    {
        let grant = profile.tdm_elsevier.as_ref()
            .ok_or(FetchError::CapabilityDenied)?;
        // grant.api_key.expose_secret() — use the API key
        // ...
    }
}

5. Startup banner (auditability)

On startup, doiget (CLI or MCP server) writes a single line to stderr describing the resolved profile. Example:

[doiget] capability: oa=on metadata=[openalex] tdm=[elsevier(agreed=2026-05-05T08:00:00Z)]

The banner is on stderr in all modes, including MCP mode (where stdout is reserved for JSON-RPC). The banner does not include any portion of any API key.

6. Reload semantics

CapabilityProfile is immutable for the lifetime of the process. A change to DOIGET_AGREE_TDM_* or a key environment variable while the process is running has no effect; the user must restart. This avoids partial-state security weakening.

7. MCP tool exposure

The MCP tool doiget_capability_profile (see MCP_TOOLS.md) reports the current profile to agents in a redacted form (no API keys, just booleans / source names and agreed_at timestamps). Agents can use this to decide whether a fetch_paper(...) call against a TDM source will succeed before issuing it.


Source: site/content/developer/capability.md