Research toolkit for triaging academic papers and GitHub projects. Triage papers and tools, reproduce benchmark claims, search Google Scholar, Semantic Scholar, PubMed, or Sci-Hub, and extract structured data from scientific PDFs.
92
92%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Critical
Do not install without reviewing
Search Semantic Scholar for papers, fetch structured metadata by paper ID or DOI, profile authors, and traverse citation graphs. Uses the official Semantic Scholar API — no scraping, structured results, free to use.
triage-papertriage-paper step 1google-scholar-search whenever both are availablesemantic-scholar MCP server is configured — optionally prefer it over this script; the MCP returns the same data with less setupgoogle-scholar-searchThis is a structured API, not a search engine — treat it accordingly.
--paper-id flag. The paper subcommand requires a Semantic Scholar paper ID, arxiv ID (prefix arXiv:), or DOI. Titles go to search.author subcommand requires a Semantic Scholar author ID. A common pitfall is passing a name and getting "not found". Use search first to locate the author ID.SEMANTIC_SCHOLAR_API_KEY to increase limits.| Goal | Subcommand | Key flag |
|---|---|---|
| Keyword discovery | search | --query |
| Full metadata for a known paper | paper | --paper-id (SS ID, arXiv:NNNN, or DOI) |
| Author publications and h-index | author | --author-id (Semantic Scholar ID) |
| Papers citing or cited by a paper | citations | --paper-id + --type |
See setup-and-troubleshooting.md.
# Keyword discovery
./scripts/semantic-scholar-search.py search \
--query "retrieval-augmented generation" --limit 10# Full paper metadata — Semantic Scholar ID, arxiv ID, or DOI
./scripts/semantic-scholar-search.py paper --paper-id "arXiv:2005.11401"
./scripts/semantic-scholar-search.py paper --paper-id "10.18653/v1/2020.acl-main.740"# Author profile — requires Semantic Scholar author ID
./scripts/semantic-scholar-search.py author --author-id "1741101"# Citations and references for a known paper
./scripts/semantic-scholar-search.py citations --paper-id "arXiv:2005.11401" --type citations
./scripts/semantic-scholar-search.py citations --paper-id "arXiv:2005.11401" --type references# Export any result to JSON
./scripts/semantic-scholar-search.py search \
--query "chain-of-thought reasoning" --limit 10 --format json \
--output /tmp/candidates.json# Step 1: search for the author's papers to locate their ID
./scripts/semantic-scholar-search.py search --query "<author name> <topic>" --limit 5 --format json \
| python3 -c "import json,sys; [print(a['authorId'], a['name']) for p in json.load(sys.stdin) for a in p.get('authors',[])]"Show results to the user. NEVER triage automatically. For each promising paper offer:
Found N results. Would you like to triage any of these with
triage-paper?
For known papers, pass the arxiv ID or DOI directly to triage-paper step 1.
WHY: The paper subcommand resolves by ID only. Passing a title returns "not found" or a wrong paper.
BAD --paper-id "Attention Is All You Need" → GOOD --paper-id "arXiv:1706.03762" or the Semantic Scholar paper ID.
WHY: The author subcommand requires a Semantic Scholar author ID (numeric string). Names are not accepted.
BAD --author-id "Yoshua Bengio" → GOOD Locate the numeric author ID via search first, then use --author-id "1794492".
WHY: Without an API key, the rate limit is 100 requests per 5 minutes. A tight loop will trigger a 429 and block further queries.
BAD Fetch citations recursively in a loop without rate limiting. → GOOD Batch fetches with time.sleep(1) or set SEMANTIC_SCHOLAR_API_KEY for higher quota.
WHY: Semantic Scholar search ranks by relevance, not publication date or citation count. The top 10 results are not "the 10 most important papers on X".
BAD Claim "here are the key papers on X" from a top-10 search. → GOOD Qualify results as "top results by Semantic Scholar relevance ranking."
WHY: Discovery and triage are separate quality gates. Auto-triaging bypasses user review.
BAD Run triage-paper on every result. → GOOD Present the candidate list; confirm with the user which papers to triage.
WHY: The API distinguishes ID types. An unprefixed DOI is treated as a Semantic Scholar paper ID and will return "not found".
BAD --paper-id "10.18653/v1/2020.acl-main.740" when the script expects an S2 ID. → GOOD Confirm the ID type; arxiv IDs need arXiv: prefix; DOIs are accepted as-is in most contexts but verify against the script's --help.
triage-paper for semantic-scholar MCP configgoogle-scholar-search
pubmed-search
reproduce-benchmark
sci-data-extractor
sci-hub-search
semantic-scholar-search
triage-paper
triage-tool