Usage Guide
Global configuration
BiblioFetch looks up ~/.config/bibliofetch/config.toml (or the path in the BIBLIOFETCH_CONFIG env var) for settings that are per-machine, not per-project:
[defaults]
email = "you@example.com" # used for Unpaywall OA lookups
store_root = "~/papers" # fallback store for the `add`/`fetch` commands
[profiles.laptop] # matches gethostname() == "laptop"
proxy = "http://proxy.univ.example:8080"
[profiles.compute-node] # ssh host — reaches the proxy via reverse tunnel
proxy = "http://localhost:18080"
[profiles.default] # hosts that match no profile fall here
# (omit `proxy` here to operate in `:oa_only` mode — Unpaywall + arXiv only)Check what BiblioFetch thinks it is running on:
$ bibliofetch env
BiblioFetch runtime
hostname : panza
profile : laptop
config : /home/you/.config/bibliofetch/config.toml
mode : direct
proxy : http://proxy.univ.example:8080 [profile]
reachable (via proxy): true
store root : /home/you/papers
email : you@example.commode is one of:
:direct— proxy reachable, non-localhost host. All sources work (including paywalled directdoi.orglanding).:tunneled— proxy islocalhost/127.0.0.1. Assumed to be anssh -R-style reverse tunnel.:oa_only— no proxy (or unreachable). Only Unpaywall + arXiv are tried. Paywalled articles are silently skipped because we have no route to them.
Per-job TOML files
A job file tells BiblioFetch.run what to fetch, where to put it, and how (which sources, how many in parallel, BibTeX or no). It has four sections:
[folder]
target = "./papers" # REQUIRED. PDFs and .metadata/ go here.
bibtex = "./refs.bib" # OPTIONAL. Generate a .bib after fetching.
[fetch]
email = "you@example.com" # overrides global config
parallel = 4 # concurrent downloads, default 1
force = false # re-download even if already present
sources = ["unpaywall", "arxiv", "direct"]
strict_duplicates = false # default: warn + keep first
[doi] # ungrouped — goes to {target}/
list = ["10.1103/PhysRevB.99.214433"]
[doi.condensed-matter] # goes to {target}/condensed-matter/
list = [
"arxiv:1106.6068",
"10.1103/PhysRevLett.96.110404",
]
[doi.condensed-matter.haldane] # nested — {target}/condensed-matter/haldane/
list = ["arxiv:cond-mat/0506438"]
[doi.ml] # {target}/ml/
list = ["arxiv:1706.03762"]Running from the shell (auto-detects job.toml in cwd):
bibliofetchor explicitly:
bibliofetch run job.tomlor from Julia:
result = BiblioFetch.run("job.toml")Reference forms
The list accepts any of:
| Form | Example |
|---|---|
| Bare DOI | 10.1103/PhysRevB.99.214433 |
| DOI URL | https://doi.org/10.1103/PhysRevB.99.214433 |
doi: prefix | doi:10.1103/... |
| New-style arXiv | 1706.03762, 1706.03762v2, arxiv:1706.03762 |
| Legacy arXiv | cond-mat/0608208 |
| arXiv URL | https://arxiv.org/abs/1706.03762 |
is_doi and is_arxiv predicates are available in the API if you need them.
Duplicate handling
If the same normalized key appears in two groups:
- Default (
strict_duplicates = false): emit a warning and only fetch once, assigning it to the first group it appeared in. TheFetchJob.duplicatesfield records every rejection. strict_duplicates = true:load_jobthrowsArgumentErrorbefore any network activity.
Source filtering
sources = [...] restricts which routes are tried and in what order. Useful when:
["arxiv"]— arXiv only. Good for a list of preprints.["unpaywall", "arxiv"]— legal OA only; skip the paywalled direct landing.["direct"]— force-test whether your proxy route works end-to-end.
Vault — topic collections
A vault is a directory of topic TOML files, by default ~/.config/bibliofetch/vault/ (override with $BIBLIOFETCH_VAULT). Each file defines a named topic:
# ~/.config/bibliofetch/vault/mps-algorithms.toml
[topic]
name = "MPS Algorithms"
tags = ["tensor-network", "dmrg"]
notes = "Core MPS/DMRG references"
[doi]
list = [
"arxiv:cond-mat/0407066",
"10.1103/RevModPhys.93.045003",
]An optional vault.toml pins the shared store path and topic order:
# ~/.config/bibliofetch/vault/vault.toml
topics = ["mps-algorithms.toml", "dmrg-foundations.toml"]
store = "~/papers/vault"Vault CLI
bibliofetch vault ls # list all topics
bibliofetch vault add arxiv:1234.5678 --topic mps-algorithms
bibliofetch vault fetch # fetch all topics
bibliofetch vault fetch mps-algorithms # fetch one topic
bibliofetch vault bib # export all as vault.bib
bibliofetch vault bib mps-algorithms # export one topic
bibliofetch vault search "DMRG" # search across all vault papersInheriting vault refs in a project
A project job.toml can pull in vault topics without duplicating files on disk:
[vault]
inherit = ["mps-algorithms", "quantum-stat-mech"]
[doi]
list = ["10.1103/PhysRevB.99.214433"] # project-specific onlybibliofetch bib then outputs a single .bib covering both vault and project references.
Annotations
Every paper in the store can carry tags, notes, reading status, and a starred flag. These live in .metadata/<safekey>.toml alongside the fetch metadata:
tags = ["mps", "finite-temperature"]
notes = "Definition 2.3 is the key result; see Eq. (15)."
read_status = "read" # unread | reading | read | skimmed
starred = trueEdit them with:
bibliofetch annotate 10.1103/PhysRevB.99.214433 # opens $EDITORFilter the listing by annotation:
bibliofetch ls --tag mps
bibliofetch ls --unread
bibliofetch ls --starredSSH reverse tunnel — running on a compute node
Forward your university proxy to a compute host that cannot reach it directly:
# ~/.ssh/config
Host compute-node
HostName compute.internal
User you
RemoteForward 18080 proxy.univ.example:8080On compute-node, put this in ~/.config/bibliofetch/config.toml:
[defaults]
email = "you@example.com"
[profiles.compute-node]
proxy = "http://localhost:18080"BiblioFetch detects the localhost proxy and classifies the mode as :tunneled automatically. Close the ssh session and the tunnel closes with it; BiblioFetch falls back to :oa_only mode until you reconnect.
Programmatic use
using BiblioFetch
# Build a Store manually (same layout, no job file)
rt = detect_environment()
store = open_store(rt.store_root)
# Fetch one paper into a subdirectory
fetch_paper!(store, "arxiv:1706.03762"; rt=rt, group="ml")
# Load a job + inspect before running
job = load_job("job.toml")
result = BiblioFetch.run(job; verbose=false)
for e in result.entries
e.status === :failed || continue
println(e.key, ": ", e.error)
for a in e.attempts
println(" ", a.source, " (", a.duration_s, "s): ",
a.error === nothing ? "ok" : a.error)
end
end
# Vault programmatic access
using BiblioFetch: load_vault_index, vault_fetch!, vault_bib
index = load_vault_index()
vault_fetch!(index; verbose=true)
vault_bib(index; out="vault.bib")CLI cheatsheet
Common:
bibliofetch # run job.toml in cwd
bibliofetch run <job.toml> # explicit job file
bibliofetch add <ref> # queue into the global store
bibliofetch sync # fetch pending refs in global store
bibliofetch fetch <ref> # one-shot into global store
bibliofetch ls # list all entries
bibliofetch ls --tag <tag> # filter by tag
bibliofetch ls --unread # filter by read_status
bibliofetch ls --starred # filter starred papers
bibliofetch annotate <ref> # edit tags/notes/status in $EDITOR
bibliofetch bib [--out path] # regenerate refs.bib
bibliofetch search <query> # full-text search
bibliofetch env # show detected runtimeVault:
bibliofetch vault ls # list topics
bibliofetch vault fetch [<topic>] # fetch all or one topic
bibliofetch vault bib [<topic>] # BibTeX for all or one topic
bibliofetch vault add <ref> --topic <t> # add ref to a topic
bibliofetch vault search <query> # search across vaultAdvanced:
bibliofetch info <ref> # show stored metadata
bibliofetch graph [--format dot|mermaid] <dir>
bibliofetch stats [<dir>]
bibliofetch dedup [--resolve] <dir>
bibliofetch doctor [--fix] <dir>
bibliofetch init <path> # create project skeleton