geo-database

Name: geo-database
Rating: 68 (1 reviews)
Author: K-Dense-AI

tessl i github:K-Dense-AI/claude-scientific-skills --skill geo-database

github.com/K-Dense-AI/claude-scientific-skills

Access NCBI GEO for gene expression/genomics data. Search/download microarray and RNA-seq datasets (GSE, GSM, GPL), retrieve SOFT/Matrix files, for transcriptomics and expression analysis.

Review Score

68%

Validation Score

12/16

Implementation Score

50%

Activation Score

83%

SKILL.md

Review

Evals

Generated

13 days ago

Warnings & errors only

Criteria	Score
skill_md_line_count	SKILL.md is long (815 lines); consider splitting into references/ and linking
description_trigger_hint	Description may be missing an explicit 'when to use' trigger hint (e.g., 'Use when...')
metadata_version	'metadata.version' is missing
body_steps	No step-by-step structure detected (no ordered list); consider adding a simple workflow

Overall Assessment

This skill provides comprehensive, actionable guidance for accessing GEO data with excellent executable code examples. However, it is severely bloated with unnecessary explanations of concepts Claude already knows (database organization, what expression data is), repetitive code patterns, and content that should be moved to reference files. The lack of validation checkpoints in multi-step workflows is a notable gap.

Suggestions

Reduce the Overview and 'Understanding GEO Data Organization' sections to a brief table or bullet list - Claude doesn't need explanations of what Series, Samples, and Platforms are conceptually.
Move the detailed analysis sections (Quality Control, Differential Expression, Clustering) to a separate ANALYSIS.md reference file, keeping only a brief pointer in the main skill.
Add explicit validation steps to batch processing workflows, e.g., 'Verify download completed: check file exists and size > 0 before proceeding to parse'.
Consolidate repetitive code patterns - the multiple search functions and download examples could be reduced to one canonical example each with brief notes on variations.

Dimension	Score	Reasoning
Conciseness	1/3	The skill is extremely verbose at 700+ lines, explaining concepts Claude already knows (what GEO is, what PDFs are equivalent explanations about data formats), includes extensive background on data organization that could be summarized in a few lines, and repeats similar code patterns multiple times.
Actionability	3/3	The skill provides fully executable Python code examples throughout, with complete import statements, function definitions, and copy-paste ready snippets for searching, downloading, and analyzing GEO data.
Workflow Clarity	2/3	While steps are listed for various operations, there are no explicit validation checkpoints or error recovery feedback loops. The batch processing sections lack verification steps to confirm successful downloads before proceeding with analysis.
Progressive Disclosure	2/3	The skill mentions a reference file (references/geo_reference.md) but includes massive amounts of content inline that should be split out. The 700+ line monolithic structure with 7 major sections could benefit from better organization across multiple files.

Overall Assessment

This is a strong, technically specific description that clearly identifies the NCBI GEO database and relevant data types. The main weakness is the absence of an explicit 'Use when...' clause, which would help Claude know exactly when to select this skill. The domain-specific terminology provides excellent trigger coverage for bioinformatics users.

Suggestions

Add an explicit 'Use when...' clause, e.g., 'Use when the user asks about gene expression data, GEO datasets, or needs to download transcriptomics data from NCBI.'

Dimension	Score	Reasoning
Specificity	3/3	Lists multiple specific concrete actions: 'Search/download microarray and RNA-seq datasets', 'retrieve SOFT/Matrix files', with specific identifiers (GSE, GSM, GPL) and clear domain (transcriptomics and expression analysis).
Completeness	2/3	Clearly answers 'what' (access NCBI GEO, search/download datasets, retrieve files) but lacks an explicit 'Use when...' clause. The triggers are implied through domain terms but not explicitly stated.
Trigger Term Quality	3/3	Excellent coverage of natural terms users would say: 'NCBI GEO', 'gene expression', 'genomics', 'microarray', 'RNA-seq', 'GSE', 'GSM', 'GPL', 'SOFT', 'Matrix files', 'transcriptomics', 'expression analysis'. These are terms bioinformaticians would naturally use.
Distinctiveness Conflict Risk	3/3	Highly distinctive with specific database (NCBI GEO), file formats (SOFT/Matrix), and identifiers (GSE, GSM, GPL). Very unlikely to conflict with other skills due to the specialized bioinformatics domain.