pub struct HttpClient { /* private fields */ }Expand description
Workspace-wide HTTP client with the security defaults applied.
Internally holds one reqwest::Client per source. Construct via
HttpClient::new with the full set of allowlists the calling process
will need.
Implementations§
Source§impl HttpClient
impl HttpClient
Sourcepub fn new(allowlists: Vec<SourceAllowlist>) -> Result<Self, Error>
pub fn new(allowlists: Vec<SourceAllowlist>) -> Result<Self, Error>
Build a client with rustls + redirect-allowlist + size cap + timeouts.
allowlists MUST cover every source whose URL might be passed in;
fetches against unregistered sources return
HttpError::UnknownSource.
§Errors
Returns the underlying reqwest::Error if ClientBuilder::build
fails (typically a TLS-backend init failure).
Sourcepub fn source_allowlist(&self, source: &str) -> Option<&SourceAllowlist>
pub fn source_allowlist(&self, source: &str) -> Option<&SourceAllowlist>
The SourceAllowlist this client was built with for source, or
None if source was not registered.
This is the identical value captured by the per-source redirect
closure (see HttpClient’s allowlists field doc). It exists so
the orchestrator can apply the docs/REDIRECT_ALLOWLIST.md §1
pre-fetch host check on a metadata-discovered OA URL — the URL that
is fetched without necessarily passing through a redirect hop —
using the same source of truth the redirect closure uses, so the two
can never disagree. Callers MUST use this for the "oa-publisher"
leg only; the initial template-constructed URL is exempt per
docs/REDIRECT_ALLOWLIST.md §6.
Sourcepub async fn fetch_bytes(
&self,
source: &str,
url: Url,
) -> Result<(Bytes, Url), HttpError>
pub async fn fetch_bytes( &self, source: &str, url: Url, ) -> Result<(Bytes, Url), HttpError>
Fetch a URL, treating it as a JSON or text body. Caps at
PDF_MAX_BYTES.
Returns the response body bytes plus the effective final URL after redirects (post-allowlist verification — every hop has already been validated by the time this returns).
§Errors
Any HttpError variant.
Sourcepub async fn fetch_bytes_with_headers(
&self,
source: &str,
url: Url,
headers: &[(&str, &str)],
) -> Result<(Bytes, Url), HttpError>
pub async fn fetch_bytes_with_headers( &self, source: &str, url: Url, headers: &[(&str, &str)], ) -> Result<(Bytes, Url), HttpError>
Like Self::fetch_bytes but attaches additional request
headers to the outgoing GET. The headers are validated up-front
against the visible-ASCII subset (RFC 7230 §3.2); any failure
returns HttpError::InvalidHeader before the request is sent.
Used by Tier-3 TDM sources that authenticate via a header
(APS Harvest X-API-Key, Elsevier ScienceDirect X-ELS-APIKey).
Header values appear on the wire only — they are never logged.
§Errors
Any HttpError variant including HttpError::InvalidHeader.
Sourcepub async fn fetch_pdf(
&self,
source: &str,
url: Url,
) -> Result<(Bytes, Url), HttpError>
pub async fn fetch_pdf( &self, source: &str, url: Url, ) -> Result<(Bytes, Url), HttpError>
Fetch a URL expected to be a PDF. Same as Self::fetch_bytes plus
the magic-byte check on the first 5 bytes
(%PDF- = [0x25, 0x50, 0x44, 0x46, 0x2D]). Mismatch returns
HttpError::NotAPdf.
§Errors
Any HttpError variant including HttpError::NotAPdf.
Source§impl HttpClient
Test-oriented HttpClient constructor. Originally cfg(test); now
also reachable from the doiget-cli orchestrator’s integration tests
(which live outside this crate and therefore cannot see cfg(test)-gated
items). The constructor name retains its for_tests_allow_http signal —
production code MUST use HttpClient::new with tier_1_allowlist.
impl HttpClient
Test-oriented HttpClient constructor. Originally cfg(test); now
also reachable from the doiget-cli orchestrator’s integration tests
(which live outside this crate and therefore cannot see cfg(test)-gated
items). The constructor name retains its for_tests_allow_http signal —
production code MUST use HttpClient::new with tier_1_allowlist.
Sourcepub fn new_for_tests_allow_http(source: &str, allowlist_host: &str) -> Self
pub fn new_for_tests_allow_http(source: &str, allowlist_host: &str) -> Self
Build a test-oriented HttpClient against an http:// wiremock
origin. The redirect closure still rejects insecure schemes — we only
relax https_only at the connection level so wiremock can serve.
This is acceptable because the redirect closure (which is the
security-load-bearing path) is exercised by the
redirect_to_http_is_rejected_by_closure test below.
Production callers MUST use HttpClient::new with
tier_1_allowlist — the for_tests_allow_http suffix is the load-
bearing signal that this constructor lifts the initial-leg HTTPS-only
requirement.
Sourcepub fn new_for_tests_allow_http_multi(entries: &[(&str, &str)]) -> Self
pub fn new_for_tests_allow_http_multi(entries: &[(&str, &str)]) -> Self
Multi-source variant of HttpClient::new_for_tests_allow_http.
Builds a relaxed-https_only client per (source, allowlist_host)
pair. Used by the doiget-cli orchestrator’s integration tests when
more than one upstream needs to be wiremocked simultaneously
(e.g. Crossref + Unpaywall against two different mock servers).
Production callers MUST use HttpClient::new with
tier_1_allowlist.
Trait Implementations§
Source§impl Clone for HttpClient
impl Clone for HttpClient
Source§fn clone(&self) -> HttpClient
fn clone(&self) -> HttpClient
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more