Nexus Gap Fixer
A local diagnostic tool that turns a journal article PDF/DOCX into a Research Nexus scorecard plus a Crossref-ready DOI deposit XML. Self-hosted. AGPL-3.0. No vendor lock-in.
Where the score tells you what's missing, this tool helps you fix it — one paper at a time, with every paid LLM call metered in USD and gated behind an explicit editor click.
Pilot pricing — $1 per article
The $1/article fee is sustenance only — to keep the project alive. It covers infrastructure, identifier-API polite-pool registration, and the time to keep extractors current with Crossref schema changes. The software itself is AGPL-3.0; you can self-host the entire pipeline at zero per-article cost forever.
Invoiced monthly. Pilot is invite-only and capacity-limited so we can give every publisher real onboarding attention.
What you get — and what you don't lose
Self-hosted
docker compose up. Runs on your infrastructure. No data leaves your network unless you opt into an LLM call.
No vendor lock-in
OpenAI-compatible LLM router — point it at OpenAI, OpenRouter, Anthropic-compat, Groq, Ollama, LiteLLM. Storage is plain disk; swap to S3/Postgres without touching app code.
AGPL-3.0
Fully reusable source code. Run it, modify it, redistribute it. If you run a modified version on a network service, you must offer that source to your users.
No fabrication
The LLM is constrained by structured outputs and explicit candidate lists. ORCIDs, DOIs, RORs, ISSNs come from APIs or PDF text — never invented.
Cost ledger per submission
Every paid call is the editor’s deliberate choice and is recorded in a per-submission USD ledger. Standard auto-fix is $0. Full premium pass tops out around $0.025/paper.
Crossref-ready output
Generates Crossref deposit XML (schema 5.3.1; 5.4.0 on the roadmap). Not just a report — a depositable artefact.
Architecture — five layers, transparent costs
Each layer is independent. The deterministic layers (L0–L2) cover most fields for free. The LLM layers (L3, L4) only run when the editor explicitly opts in, with an upfront cost-confirm dialog.
| Layer | Cost | What |
|---|---|---|
| L0 · Docling layout | $0 | PDF/DOCX → structured JSON, per-element bboxes, page renders. Always runs. |
| L1 · Deterministic factsheet | $0 | Regex sweep for DOI / ORCID / ROR / ISSN / arXiv / license / grant patterns + PDF /Info + header parser + boilerplate anchors (funding, CoI, data availability, ethics). Always runs. |
| L2 · Free identifier APIs | $0 | ORCID public API, ROR v2, OpenAlex, Crossref REST. Affiliation normalisation, name-swap fallback, ROR clear-winner auto-accept. Triggered by the Auto-fix button. |
| L3 · LLM picker (opt-in) | ~$0.0002 / call | Per-field structured-output call that picks among API candidates with reasoning + confidence. Only when the editor clicks "Adjudicate with AI". |
| L4 · LLM structurer (opt-in) | ~$0.0003 – $0.013 / call | Higher-leverage call that takes a content region and returns a clean structured record. Five named tasks: structure_authors, structure_references, structure_funding, structure_credit, verify_authors. |
| Side path · GLiNER2 NER | $0 (on-device) | Local zero-shot NER (fastino/gliner2-large-v1) for ad-hoc entity extraction over a chosen text region. No network calls, no LLM cost. |
For a typical paper, full premium processing tops out around $0.025. Standard auto-fix (no LLM) is $0.
Scored on the same Research Nexus rubric as the leaderboard
One Mandatory deposit gate, then five Research Nexus dimensions using the exact weights you see on nexus-score. The hero score on every paper is the same weighted percentage — so what an editor sees on a single article maps directly to what their publisher sees on the leaderboard.
Mandatory gate
DOI, title, journal, ISSN, year, ≥1 author, full pub date, vol/issue/pages, copyright. Must be satisfied before deposit.
Provenance · 25%
References, refs-with-DOI, preprint→VoR link, Crossmark, conflict-of-interest, data/code availability.
People · 20%
Full author names, ORCID for corresponding + every author, CRediT contributor roles.
Funding · 20%
Funder Registry DOI, award/grant numbers.
Access · 20%
Abstract (plain + JATS), license, OA indicator, plain-language summary.
Organizations · 15%
Affiliations extracted, ROR for every affiliation.
Try it on your titles
Demo first. Pilot if it earns its place. $1/article pilot pricing — sustenance only, AGPL forever.