CtrlK
BlogDocsLog inGet started
Tessl Logo

neo4j-modeling-skill

Design, review, and refactor Neo4j graph data models. Use when choosing node labels vs relationship types vs properties, migrating relational/document schemas to graph, detecting anti-patterns (generic labels, supernodes, missing constraints), designing intermediate nodes for n-ary relationships, enforcing schema with constraints and indexes, or assessing an existing model against graph modeling best practices. Does NOT handle Cypher query authoring — use neo4j-cypher-skill. Does NOT handle Spring Data Neo4j entity mapping — use neo4j-spring-data-skill. Does NOT handle GraphQL type definitions — use neo4j-graphql-skill. Does NOT handle data import — use neo4j-import-skill.

72

Quality

88%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

When to Use

  • Designing graph model from scratch (domain → nodes, rels, props)
  • Reviewing existing model for anti-patterns
  • Deciding node vs property vs relationship vs label
  • Migrating relational or document schema to graph
  • Designing intermediate nodes for n-ary or complex relationships
  • Detecting and mitigating supernode / high-fanout problems
  • Choosing and creating constraints + indexes for a model

When NOT to Use

  • Writing or optimizing Cypherneo4j-cypher-skill
  • Spring Data Neo4j (@Node, @Relationship)neo4j-spring-data-skill
  • GraphQL type definitionsneo4j-graphql-skill
  • Importing data (LOAD CSV, APOC import)neo4j-import-skill

Inspect Before Designing

On existing database, run first — never propose changes without current state:

CALL db.schema.visualization() YIELD nodes, relationships RETURN nodes, relationships;
SHOW CONSTRAINTS YIELD name, type, labelsOrTypes, properties RETURN name, type, labelsOrTypes, properties;
SHOW INDEXES YIELD name, type, labelsOrTypes, state WHERE state = 'ONLINE' RETURN name, type, labelsOrTypes;

If APOC available:

CALL apoc.meta.schema() YIELD value RETURN value;

MCP tool map:

OperationTool
Inspect schemaget-schema
SHOW CONSTRAINTS, SHOW INDEXESread-cypher
CREATE CONSTRAINT ... IF NOT EXISTSwrite-cypher (show + confirm first)

Defaults — Apply to Every Model

  1. Use-case first — list 5+ queries the model must answer before designing
  2. Nodes = entities (nouns) with identity; rels = connections (verbs) with direction
  3. Labels PascalCase; rel types SCREAMING_SNAKE_CASE; properties camelCase
  4. Every node type used in MERGE has a uniqueness constraint on its key property
  5. Add property type constraints (REQUIRE n.prop IS :: STRING) where the type is known — helps the query planner and catches bad writes early
  6. No generic labels (:Entity, :Node, :Thing); no generic rel types (:RELATED_TO, :HAS)
  7. Security labels (used for row-level access control) should start with a common prefix (e.g. Sec) so application code can reliably filter them out of the domain schema
  8. Rel direction encodes semantic meaning — not arbitrary
  9. Inspect schema before proposing any change on an existing database
  10. All constraint/index DDL uses IF NOT EXISTS — safe to rerun
  11. On Neo4j 2026.02+ (Enterprise/Aura): consider ALTER CURRENT GRAPH TYPE SET { … } or EXTEND GRAPH TYPE WITH { … } to declare the full model in one block instead of individual CREATE CONSTRAINT statements — see neo4j-cypher-skill/references/graph-type.md. PREVIEW — syntax may change before GA.

Key Patterns

Node vs Relationship vs Property — Decision Table

QuestionAnswerModel as
Is it a thing with identity, queried as entry point?YesNode
Is it a connection between two things with direction?YesRelationship
Does the connection have its own properties or multiple targets?YesIntermediate node
Is it a scalar always returned with its parent, never filtered alone?YesProperty on parent
Is it a category used for type-based filtering or path traversal?YesLabel (not a property)
Does the same attribute value repeat across many nodes (low cardinality)?YesLabel, not a property node
Is it a fact connecting >2 entities?YesIntermediate node

Property vs Label — Decision Table

Use label whenUse property when
Values are few, fixed, used as traversal filters (WHERE n:Active)Values are many, dynamic, or unique per node
You traverse by type (MATCH (n:VIPCustomer))You filter by value (WHERE n.tier = 'vip')
Category drives index selectionFine-grained value drives range scans
Example: :Active, :Verified, :PremiumExample: status, score, email

Rule: adding a label is cheap; scanning all :Label nodes is fast. Never model high-cardinality values as labels.


Intermediate Node Pattern

Use when a relationship needs its own properties, connects >2 entities, or is independently queryable.

Before (relationship with property — limited):

(Person)-[:ACTED_IN {role: "Neo"}]->(Movie)
// Cannot query roles independent of movies

After (intermediate node — queryable, extensible):

(Person)-[:PLAYED]->(Role {name: "Neo"})-[:IN]->(Movie)
// MATCH (r:Role) WHERE r.name STARTS WITH 'Neo' RETURN r

Employment overlap example:

// Find colleagues who overlapped at same company
MATCH (p1:Person)-[:WORKED_AT]->(e1:Employment)-[:AT]->(c:Company)<-[:AT]-(e2:Employment)<-[:WORKED_AT]-(p2:Person)
WHERE p1 <> p2
  AND e1.startDate <= e2.endDate AND e2.startDate <= e1.endDate
RETURN p1.name, p2.name, c.name

Promote relationship to intermediate node when:

  • Relationship has >2 properties
  • Relationship is the subject of another query
  • Multiple entities share the same connection context
  • You need to connect >2 entities in one fact

Relational → Graph Migration Table

Relational constructGraph equivalentNotes
Table rowNodeOne label per table (add more as needed)
Column (scalar)Node property
Primary keyUniqueness constraint propertyUse tmdbId, not id (too generic)
Foreign keyRelationshipDirection: from dependent → referenced
Many-to-many junction tableIntermediate nodeEspecially if junction has own columns
Junction table (no own columns)Direct relationshipSimpler; upgrade to intermediate node later
NULL FK (optional relation)Absent relationshipNo node created; absence is the signal
Polymorphic FK (Rails-style)Multiple labels or relationship typesSplit into type-specific rels
Self-referential FKSame-label relationship:Employee {managerId}(e)-[:REPORTS_TO]->(m)
Audit/history columnsIntermediate versioning nodeSee References for versioning pattern

Supernode Detection and Mitigation

Detect:

// Find top-10 highest-degree nodes
MATCH (n)
RETURN labels(n) AS labels, elementId(n) AS id, count{ (n)--() } AS degree
ORDER BY degree DESC LIMIT 10

Node with degree >> median for its label = supernode candidate. Any node with >100K relationships will degrade traversal queries that pass through it.

Causes:

  • Domain supernodes: airports, celebrities, popular hashtags — unavoidable
  • Modeling supernodes: gender, country, status modeled as nodes with millions of edges — avoidable

Mitigation strategies (in priority order):

StrategyWhen to useImplementation
Query directionDirectional asymmetry existsQuery from low-degree side; exploit direction
Relationship type splitSupernode serves multiple roles:FOLLOWS + :FAN instead of single :RELATED_TO
Label segregationSupernode conflates entity types:Celebrity vs :User → query only relevant subtype
Bucket patternTime-series or high-volume event nodesSee below
Avoid modelingLow-cardinality categoricalsUse label instead of node (:Active not (:Status {name:"Active"}))
Join hintQuery tuning last resortUSING JOIN ON n in Cypher

Bucket pattern (time-series / high-volume):

// Instead of: (:User)-[:VIEWED]->(:Page) (millions of rels per user)
// Bucket by hour:
(u:User)-[:VIEWED_IN]->(b:ViewBucket {userId: u.id, hour: '2025-04-28T14'})-[:VIEWED]->(p:Page)

// Query last hour's views without traversing full history:
MATCH (u:User {id: $uid})-[:VIEWED_IN]->(b:ViewBucket {hour: $hour})-[:VIEWED]->(p)
RETURN p.url

Naming Conventions

ElementConventionGoodBad
Node labelPascalCase, singular noun:Person, :BlogPost:person, :blog_posts, :Entity
Relationship typeSCREAMING_SNAKE_CASE, verb phrase:ACTED_IN, :WORKS_FOR:actedin, :relatedTo, :HAS
Property keycamelCasefirstName, createdAtFirstName, first_name
Constraint namesnake_case descriptiveperson_id_uniqueconstraint1
Index namesnake_case descriptiveperson_name_idxindex2

Schema Enforcement — What to Create for Each Element

Run all DDL with IF NOT EXISTS. Apply before importing data.

// 1. Uniqueness constraint — every node type used in MERGE
CREATE CONSTRAINT person_id_unique IF NOT EXISTS
  FOR (p:Person) REQUIRE p.id IS UNIQUE;

// 2. Existence constraint (Enterprise) — mandatory properties
CREATE CONSTRAINT person_name_exists IF NOT EXISTS
  FOR (p:Person) REQUIRE p.name IS NOT NULL;

// 3. Property type constraint (Enterprise) — enforce data type
CREATE CONSTRAINT person_born_integer IF NOT EXISTS
  FOR (p:Person) REQUIRE p.born IS :: INTEGER;

// 4. Key constraint (Enterprise) — unique + exists in one
CREATE CONSTRAINT movie_tmdbid_key IF NOT EXISTS
  FOR (m:Movie) REQUIRE m.tmdbId IS NODE KEY;

// 5. Range index — equality and range filters on properties
CREATE INDEX person_name_idx IF NOT EXISTS
  FOR (p:Person) ON (p.name);

// 6. Fulltext index — CONTAINS, STARTS WITH, free text search
CREATE FULLTEXT INDEX person_fulltext IF NOT EXISTS
  FOR (n:Person) ON EACH [n.name, n.bio];

// 7. Vector index — embedding similarity search
CREATE VECTOR INDEX chunk_embedding_idx IF NOT EXISTS
  FOR (c:Chunk) ON (c.embedding)
  OPTIONS { indexConfig: { `vector.dimensions`: 1536, `vector.similarity_function`: 'cosine' } };

// 8. Relationship index — filter on rel properties
CREATE INDEX acted_in_year_idx IF NOT EXISTS
  FOR ()-[r:ACTED_IN]-() ON (r.year);

After creating indexes, poll until ONLINE:

SHOW INDEXES YIELD name, state WHERE state <> 'ONLINE' RETURN name, state;

Do NOT use an index until state = ONLINE.


Vector / Embedding Property Modeling

Store embeddings on dedicated :Chunk nodes, never on business nodes:

(:Document)-[:HAS_CHUNK]->(c:Chunk {text: "...", embedding: [...]})

Rules:

  • Chunk node: text (source text), embedding (float array), chunkIndex (int)
  • Parent document: metadata only (title, url, createdAt)
  • Vector index on c.embedding only
  • Chunk size 200–500 tokens with 20% overlap is production default [field]
  • Do NOT put embedding on :Document — makes the node too large and pollutes traversal

Anti-Patterns Table

Anti-patternProblemFix
Generic labels :Entity, :NodeNo filtering benefit; all nodes scanUse domain labels :Person, :Product
Generic rel types :RELATED_TO, :HASCan't filter by relationship typeUse semantic types :PURCHASED, :AUTHORED
Low-cardinality value as nodeSupernode (:Status {name:"active"} → millions of edges)Use label :Active instead
Property as label (n.type = 'VIP' + :VIP label both exist)Inconsistency, duplicationPick one; prefer label if used in traversal
Storing embeddings on business nodeNode bloat, slow traversalDedicated :Chunk node
MERGE without uniqueness constraintDuplicate nodes silently createdAdd constraint before any MERGE
Missing relationship direction meaningArbitrary direction; confusing modelDirection = semantic flow of action
Junction table modeled as bare propertyLoses history and extensibilityIntermediate node with its own properties
id as property nameid(n) is a deprecated Cypher function (use elementId(n)); bare id is fine as a property name in practice, but domain-qualified names (personId, movieId) are clearer and avoid any future ambiguityPrefer personId, movieId, tmdbId where it aids readability
All dates as stringsNo range queries; no temporal operatorsUse Neo4j date() or datetime() type

Output Format — Schema Assessment

When reviewing an existing model:

## Schema Assessment

### Compliant
- [constraint / pattern that is correct]

### Issues Found
#### [Title] — Severity: ERROR / WARNING / INFO
- **Current**: what the model does
- **Problem**: why it is an issue
- **Fix**: specific Cypher DDL or model change

## Recommended Schema
### Node Labels
- :Label {key: TYPE, prop: TYPE, ...}  → constraints: [list]

### Relationships
- (:LabelA)-[:TYPE {prop: TYPE}]->(:LabelB)

### Constraints to Create
[CREATE CONSTRAINT ... statements]

### Indexes to Create
[CREATE INDEX ... statements]

Severity semantics:

SeverityMeaningAction
ERRORModel correctness failure (duplicates possible, data loss risk)Stop; fix before proceeding
WARNINGPerformance or extensibility riskReport; ask user before proceeding
INFOStyle or convention deviationSurface; continue

Provenance Labels

  • [official] — stated directly in Neo4j docs
  • [derived] — follows from documented behavior
  • [field] — community heuristic; treat as default but validate

Checklist

  • Use cases (≥5 queries) defined before modeling
  • Schema inspected on existing database before changes proposed
  • Every MERGE-target node label has a uniqueness constraint
  • No generic labels (:Entity, :Node, :Thing)
  • No generic relationship types (:RELATED_TO, :HAS, :CONNECTED_TO)
  • Relationship direction encodes semantic meaning
  • N-ary or propertied relationships use intermediate nodes
  • High-cardinality values stored as properties, not nodes
  • Low-cardinality categoricals used as labels, not property nodes
  • Embeddings on dedicated :Chunk nodes, not business nodes
  • Supernode candidates identified and mitigated
  • All DDL uses IF NOT EXISTS
  • Indexes polled to ONLINE before use
  • Assessment output follows the structured format above
  • Every prohibition paired with a concrete fix

References

Load on demand:

  • references/modeling-patterns.md — time-series, versioning, multi-tenancy, linked list, access control patterns
  • Neo4j Data Modeling Guide
  • Neo4j Modeling Tips
  • GraphAcademy: Graph Data Modeling Fundamentals
  • Super Nodes — All About Super Nodes (David Allen)
Repository
neo4j-contrib/neo4j-skills
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.