Research toolkit for triaging academic papers and GitHub projects. Triage papers and tools, reproduce benchmark claims, search Google Scholar, Semantic Scholar, PubMed, or Sci-Hub, and extract structured data from scientific PDFs.
92
92%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Critical
Do not install without reviewing
Search and analyze biomedical literature from PubMed using the free NCBI E-utilities API.
triage-papertriage-papersemantic-scholar MCP or PubTator MCP is configured — prefer the MCP; it returns structured data with no rate-limit riskgoogle-scholar-search or semantic-scholar-search/tmp/<topic>-candidates.json — reuse itWhen available, prefer the PubTator MCP server over this script:
{
"mcpServers": {
"pubtator": {
"type": "stdio",
"command": "uvx",
"args": ["pubtator-mcp-server"]
}
}
}Search is discovery, not analysis. The goal is a structured candidate list.
ALWAYS check for a pubtator or pubmed MCP before running the script. If configured and reachable, prefer it.
See setup-and-troubleshooting.md for venv creation and dependency installation.
Activate the venv, then choose the appropriate subcommand. You may optionally add --show-abstract to keyword searches for a richer preview.
# Basic keyword search
./scripts/pubmed_search.py search --keywords "CRISPR gene editing" --results 10
# Advanced: filter by author, journal, and date range
./scripts/pubmed_search.py search --term "cancer immunotherapy" --author "Smith" \
--journal "Nature" --start-date "2021" --end-date "2024" --results 20
# Fetch metadata for a known PMID
./scripts/pubmed_search.py metadata --pmid "33303479" --format json
# Deep paper analysis
./scripts/pubmed_search.py analyze --pmid "33303479" --output analysis.md
# Download open-access PDF
./scripts/pubmed_search.py download --pmid "33303479" --output-dir ./papers/
# Export candidate list to JSON
./scripts/pubmed_search.py search --keywords "Alzheimer disease biomarkers" \
--results 50 --format json --output /tmp/candidates.jsonIf the script returns HTTP 429, wait 30 s and retry once:
sleep 30 && ./scripts/pubmed_search.py search --keywords "<topic>" --results 10NEVER retry in a tight loop. If still failing, set PUBMED_API_KEY in the environment.
NEVER triage automatically — ALWAYS confirm with the user first:
Found N results. Would you like to triage any of these with
triage-paper?
WHY: Discovery and triage are separate quality gates. Auto-triaging bypasses user review.
BAD Pass every result to triage-paper immediately. → GOOD Present the list; wait for the user to choose.
WHY: Repeated rapid retries worsen the block and extend the cooldown period.
BAD Loop search until it succeeds. → GOOD Retry once after 30 s; switch to MCP or API key on second failure.
WHY: Many PMC articles are not open access; the download command will fail or return a redirect link.
BAD Call download for every PMID. → GOOD Check PMC availability and open-access status before downloading.
WHY: The Python script is the fragile fallback. Skipping the check needlessly risks rate-limiting.
BAD Invoke the script without checking for a PubTator or PubMed MCP. → GOOD ALWAYS check MCP availability first; only fall back to the script if no MCP is configured.
WHY: API keys committed to source are a production security risk and will be rotated or revoked.
BAD Set api_key = "abc123" inside the script. → GOOD ALWAYS use environment variables (PUBMED_API_KEY) or a .env file that is gitignored.
WHY: PubMed metadata contains only the abstract. Deep analysis based solely on abstracts is incomplete and a pitfall for research quality.
BAD Mark a paper as "fully analysed" from analyze output alone. → GOOD Qualify analysis as "abstract-based" and recommend obtaining the full text for production use.
google-scholar-search
pubmed-search
reproduce-benchmark
sci-data-extractor
sci-hub-search
semantic-scholar-search
triage-paper
triage-tool