Deterministic, verifiable data for AI agents.
Pinakes is a single signed static Go binary — with zero runtime dependencies — that gives AI agents reproducible, complete, and verifiable access to public scientific databases. It is a CLI, a Model Context Protocol stdio server, and a local REST server, all in one executable.
Named for the Pínakes, the catalogue of the Library of Alexandria — the first index of all knowledge.
The problem: ask a public biological database the same question twice and you can get different result sets. Tools paginate and truncate differently, and the databases expose no stable order. An agent's analysis silently inherits that drift, and nothing flags it.
The fix: every Pinakes query pins a content-addressed snapshot, so results are byte-identical forever; retrieval is complete-or-fail (the returned count is reconciled against the source's authoritative total — a short set fails loudly, never silently); and every result ships a re-runnable manifest that
pinakes verifyre-derives offline to prove the result is exactly what it claims. It is git for scientific data.
Install the single static binary:
curl -fsSL https://get.pinakes.sh | shNo token, no Docker, no network at query time beyond the source databases themselves. The install script verifies the release's SHA-256 checksum (and its cosign keyless signature when cosign is present) before installing anything.
Then register pinakes mcp with your MCP client — any client (Claude, Cursor, Cline, Windsurf, Zed, …); see Add to any MCP client below. (In Claude Code, for example, that's one line: claude mcp add --scope user pinakes -- pinakes mcp.)
Homebrew (coming soon).
brew install pinakes-sh/tap/pinakesis pending the public Homebrew tap and is not live yet — use thecurl | shline above for now.
Most clients (Claude Desktop, Cursor, Windsurf, Zed, VS Code via .mcp.json, …) read a standard config block. Add Pinakes to it:
{
"mcpServers": {
"pinakes": {
"command": "pinakes",
"args": ["mcp"]
}
}
}No env, no token, no Docker — fully local and offline. If your GUI app can't find pinakes on its PATH, replace "command": "pinakes" with the absolute path from which pinakes.
Homebrew (coming soon)
brew install pinakes-sh/tap/pinakesPending the public Homebrew tap — not live yet. Until then, use the curl | sh line above.
Go toolchain
go install pinakes.sh/pinakes/cmd/pinakes@latestAfter installing, confirm the server is wired up by listing the catalogue:
pinakes catalogPinakes works fully without any key. The only sources that benefit today are the two NCBI sources (ncbi-protein, ncbi-virus): supplying a free NCBI api key raises your own upstream rate limit (E-utilities and the NCBI Datasets API both lift to ~10 requests/sec with a key, vs 3–5 without).
A key affects rate limits only — never the results, their order, or their hashes. Pinakes attaches the key to the outbound HTTP request alone; it never enters a snapshot, a record, or the reproducibility manifest, so a query is byte-identical and reproducible with or without a key.
The single ncbi alias covers both NCBI sources. Set it via the scoped CLI (the key is read from stdin, never passed as an argument, and stored at ~/.config/pinakes/config.yaml with 0600 permissions):
pinakes config set-key ncbi # paste at the prompt, or: < keyfile
pinakes config list # shows which sources have a key (never the value)Or supply it via the environment (overrides the file; nothing on disk):
export PINAKES_NCBI_API_KEY=… # honored by both NCBI sourcesGet a free key from your NCBI account → Settings → API Key Management. The key is never exposed to the model or the MCP surface — pinakes config is a local CLI only, and get/list redact the value unless you pass --reveal.
Pinakes stores its content-addressed snapshots and reproducibility manifests under your OS cache
directory by default — ~/Library/Caches/pinakes on macOS, $XDG_CACHE_HOME/pinakes (else
~/.cache/pinakes) on Linux. It is created on first use; back it up or clear it like any cache.
Relocate it with PINAKES_STATE_ROOT:
export PINAKES_STATE_ROOT=/path/to/pinakes-state # where the local snapshot store livesEverything stays local: a pinned query and its pinakes verify re-run read this directory with
no network access.
Three guarantees, none of which a raw API call or a typical MCP wrapper gives you:
- Determinism. Every query pins a content-addressed snapshot of the source. The same pin returns one byte-identical set of normalized records — every run, forever — even when the underlying database reorders or grows. Where a source exposes no stable sort, Pinakes imposes one before hashing, so a pinned re-run cannot diverge.
- Complete-or-fail. At capture, the retrieved count is reconciled against the source's own authoritative total. If the set is not provably complete, the query fails loudly instead of quietly handing back a shorter one. Silent truncation is the bug Pinakes exists to kill.
- Verify, don't trust. Every result ships a re-runnable provenance manifest.
pinakes verifyre-derives the count and the record hash from the pinned snapshot offline — exit 0 means the snapshot reproduces exactly the result the manifest claims. A shortened or altered snapshot is refused.
Pinakes fixes the input layer. It does not run analysis, does not do phylogenetics, and never claims a source is "wrong" — the public databases are authoritative. It fixes naive client retrieval: no imposed order, no reconciliation, no provenance.
The MCP server exposes eight verbs. Each is also a CLI subcommand and a REST endpoint.
| Tool | What it does |
|---|---|
catalog |
List available sources and their filter schema. |
search |
Run a filter-driven query against a source. |
get |
Fetch a record by identifier. |
resolve |
Map an identifier from one namespace into another. |
estimate |
Dry-run a plan — counts and cost — with no retrieval. |
export |
Materialize a result set to a format. |
jobs |
Check the status of an async job. |
verify |
Prove that a manifest reproduces — offline. |
26 public databases — proteins and structures (UniProt, PDB, AlphaFold, InterPro), variants and genetics (ClinVar, gnomAD, GWAS Catalog, Ensembl), chemistry and drugs (ChEMBL, PubChem, Guide to Pharmacology, openFDA), pathways and ontologies (Reactome, Gene Ontology, HPO, Rhea, Monarch), expression and disease (GTEx, Open Targets, cBioPortal, ClinicalTrials.gov), and more.
pinakes catalog is the authoritative, live list — every source with its filter schema, maturity, license, and required attribution. The README does not duplicate it; run it to see exactly what your binary serves:
pinakes catalogPinakes offers a trust chain that no npx- or Docker-based MCP server can:
- Signed binary. Releases are cosign keyless-signed (Sigstore + GitHub OIDC) and ship with an SBOM. The install script verifies the checksum (and the cosign signature when
cosignis present) before anything lands on your machine. - Offline data provenance. After install,
pinakes verifyproves data provenance — it re-derives a result from its pinned snapshot with no network access and refuses any manifest whose snapshot has been tampered with.
case-studies/ebola/ is a worked case study on real data — a pinned, Complete (32/32), byte-identical, offline-verify-proven snapshot of the complete-genome records for Zaire ebolavirus from NCBI Virus. It runs the pinned query twice (hashes identical) and verifies it, entirely offline. CI denies all network access and asserts that the result reproduces byte-for-byte, is Complete, verifies, and that a tampered manifest is refused.
./case-studies/ebola/run.shA pinned, complete, verifiable search — and the offline proof:
pinakes search --source-id ncbi-virus \
--filters organism_taxon_id:eq:186538 \
--filters complete_only:eq:true \
--filters released_since:gte:2023-01-01 \
--snapshot-version sha256:b1983c09… # → 32 records, Complete (32/32)
pinakes verify --manifest @manifest.json # → {"verified": true}, offlineOnce registered, an agent calls the verbs by name. A typical search call:
{
"name": "pinakes_search",
"arguments": {
"source_id": "uniprot",
"filters": [
{ "field": "reviewed", "operator": "eq", "value": "true" },
{ "field": "organism_taxon_id", "operator": "in", "value": "9606" }
]
}
}The agent gets back normalized records plus a manifest it can later hand to pinakes_verify to prove the set is exactly what it claims — reproducibly, offline.
Pinakes is self-describing — an agent needs no external docs beyond the MCP tool schemas:
- Call
catalogto discover the live sources and each source's filter schema (field names, allowed operators, enums) — this is how an agent learns what it can query. - Call
search/get/resolvewith filters validated against that schema. An unknown field or bad value is rejected locally with a precise code (UNKNOWN_FIELD/INVALID_FILTER/BAD_ENUM) before any network call — an actionable correction, not a stack trace. - Keep the returned
manifestand callverifyto prove the result reproduces.
Every tool's name, parameters, and description are served via MCP tools/list, so the full contract is in-band — no out-of-band documentation required.
- Website: pinakes.sh
- CLI:
pinakes· Go module:pinakes.sh/pinakes - Reproducibility case study:
case-studies/ebola/README.md - Determinism contract: CONTRACTS.md
- Release process: RELEASING.md · Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md · Code of conduct: CODE_OF_CONDUCT.md
- Security: report vulnerabilities per SECURITY.md
A future hosted tier (accounts, metering) will use
PINAKES_API_KEY. The local binary in this repository needs no key — it is the open, local engine, fully usable offline.
Pinakes follows SemVer. Tags are vMAJOR.MINOR.PATCH.
Two things version separately:
- The determinism contract is frozen at
ManifestSchemaVersion1.0.0. A pinned query and the manifestpinakes verifyre-derives stay reproducible across releases. See CONTRACTS.md for what the contract covers and the rules for ever changing it. - The CLI, MCP, and REST surfaces are still on the
0.xline. Flags, tool names, and endpoints may change before1.0.0; surface stability is promised only at1.0.0. The contract above does not move when a surface does.
Apache License 2.0. See LICENSE.
