Design, review, and refactor Neo4j graph data models. Use when choosing node labels vs relationship types vs properties, migrating relational/document schemas to graph, detecting anti-patterns (generic labels, supernodes, missing constraints), designing intermediate nodes for n-ary relationships, enforcing schema with constraints and indexes, or assessing an existing model against graph modeling best practices. Does NOT handle Cypher query authoring — use neo4j-cypher-skill. Does NOT handle Spring Data Neo4j entity mapping — use neo4j-spring-data-skill. Does NOT handle GraphQL type definitions — use neo4j-graphql-skill. Does NOT handle data import — use neo4j-import-skill.
72
88%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
neo4j-cypher-skillneo4j-spring-data-skillneo4j-graphql-skillneo4j-import-skillOn existing database, run first — never propose changes without current state:
CALL db.schema.visualization() YIELD nodes, relationships RETURN nodes, relationships;
SHOW CONSTRAINTS YIELD name, type, labelsOrTypes, properties RETURN name, type, labelsOrTypes, properties;
SHOW INDEXES YIELD name, type, labelsOrTypes, state WHERE state = 'ONLINE' RETURN name, type, labelsOrTypes;If APOC available:
CALL apoc.meta.schema() YIELD value RETURN value;MCP tool map:
| Operation | Tool |
|---|---|
| Inspect schema | get-schema |
SHOW CONSTRAINTS, SHOW INDEXES | read-cypher |
CREATE CONSTRAINT ... IF NOT EXISTS | write-cypher (show + confirm first) |
REQUIRE n.prop IS :: STRING) where the type is known — helps the query planner and catches bad writes early:Entity, :Node, :Thing); no generic rel types (:RELATED_TO, :HAS)Sec) so application code can reliably filter them out of the domain schemaIF NOT EXISTS — safe to rerunALTER CURRENT GRAPH TYPE SET { … } or EXTEND GRAPH TYPE WITH { … } to declare the full model in one block instead of individual CREATE CONSTRAINT statements — see neo4j-cypher-skill/references/graph-type.md. PREVIEW — syntax may change before GA.| Question | Answer | Model as |
|---|---|---|
| Is it a thing with identity, queried as entry point? | Yes | Node |
| Is it a connection between two things with direction? | Yes | Relationship |
| Does the connection have its own properties or multiple targets? | Yes | Intermediate node |
| Is it a scalar always returned with its parent, never filtered alone? | Yes | Property on parent |
| Is it a category used for type-based filtering or path traversal? | Yes | Label (not a property) |
| Does the same attribute value repeat across many nodes (low cardinality)? | Yes | Label, not a property node |
| Is it a fact connecting >2 entities? | Yes | Intermediate node |
| Use label when | Use property when |
|---|---|
Values are few, fixed, used as traversal filters (WHERE n:Active) | Values are many, dynamic, or unique per node |
You traverse by type (MATCH (n:VIPCustomer)) | You filter by value (WHERE n.tier = 'vip') |
| Category drives index selection | Fine-grained value drives range scans |
Example: :Active, :Verified, :Premium | Example: status, score, email |
Rule: adding a label is cheap; scanning all :Label nodes is fast. Never model high-cardinality values as labels.
Use when a relationship needs its own properties, connects >2 entities, or is independently queryable.
Before (relationship with property — limited):
(Person)-[:ACTED_IN {role: "Neo"}]->(Movie)
// Cannot query roles independent of moviesAfter (intermediate node — queryable, extensible):
(Person)-[:PLAYED]->(Role {name: "Neo"})-[:IN]->(Movie)
// MATCH (r:Role) WHERE r.name STARTS WITH 'Neo' RETURN rEmployment overlap example:
// Find colleagues who overlapped at same company
MATCH (p1:Person)-[:WORKED_AT]->(e1:Employment)-[:AT]->(c:Company)<-[:AT]-(e2:Employment)<-[:WORKED_AT]-(p2:Person)
WHERE p1 <> p2
AND e1.startDate <= e2.endDate AND e2.startDate <= e1.endDate
RETURN p1.name, p2.name, c.namePromote relationship to intermediate node when:
| Relational construct | Graph equivalent | Notes |
|---|---|---|
| Table row | Node | One label per table (add more as needed) |
| Column (scalar) | Node property | |
| Primary key | Uniqueness constraint property | Use tmdbId, not id (too generic) |
| Foreign key | Relationship | Direction: from dependent → referenced |
| Many-to-many junction table | Intermediate node | Especially if junction has own columns |
| Junction table (no own columns) | Direct relationship | Simpler; upgrade to intermediate node later |
| NULL FK (optional relation) | Absent relationship | No node created; absence is the signal |
| Polymorphic FK (Rails-style) | Multiple labels or relationship types | Split into type-specific rels |
| Self-referential FK | Same-label relationship | :Employee {managerId} → (e)-[:REPORTS_TO]->(m) |
| Audit/history columns | Intermediate versioning node | See References for versioning pattern |
Detect:
// Find top-10 highest-degree nodes
MATCH (n)
RETURN labels(n) AS labels, elementId(n) AS id, count{ (n)--() } AS degree
ORDER BY degree DESC LIMIT 10Node with degree >> median for its label = supernode candidate. Any node with >100K relationships will degrade traversal queries that pass through it.
Causes:
Mitigation strategies (in priority order):
| Strategy | When to use | Implementation |
|---|---|---|
| Query direction | Directional asymmetry exists | Query from low-degree side; exploit direction |
| Relationship type split | Supernode serves multiple roles | :FOLLOWS + :FAN instead of single :RELATED_TO |
| Label segregation | Supernode conflates entity types | :Celebrity vs :User → query only relevant subtype |
| Bucket pattern | Time-series or high-volume event nodes | See below |
| Avoid modeling | Low-cardinality categoricals | Use label instead of node (:Active not (:Status {name:"Active"})) |
| Join hint | Query tuning last resort | USING JOIN ON n in Cypher |
Bucket pattern (time-series / high-volume):
// Instead of: (:User)-[:VIEWED]->(:Page) (millions of rels per user)
// Bucket by hour:
(u:User)-[:VIEWED_IN]->(b:ViewBucket {userId: u.id, hour: '2025-04-28T14'})-[:VIEWED]->(p:Page)
// Query last hour's views without traversing full history:
MATCH (u:User {id: $uid})-[:VIEWED_IN]->(b:ViewBucket {hour: $hour})-[:VIEWED]->(p)
RETURN p.url| Element | Convention | Good | Bad |
|---|---|---|---|
| Node label | PascalCase, singular noun | :Person, :BlogPost | :person, :blog_posts, :Entity |
| Relationship type | SCREAMING_SNAKE_CASE, verb phrase | :ACTED_IN, :WORKS_FOR | :actedin, :relatedTo, :HAS |
| Property key | camelCase | firstName, createdAt | FirstName, first_name |
| Constraint name | snake_case descriptive | person_id_unique | constraint1 |
| Index name | snake_case descriptive | person_name_idx | index2 |
Run all DDL with IF NOT EXISTS. Apply before importing data.
// 1. Uniqueness constraint — every node type used in MERGE
CREATE CONSTRAINT person_id_unique IF NOT EXISTS
FOR (p:Person) REQUIRE p.id IS UNIQUE;
// 2. Existence constraint (Enterprise) — mandatory properties
CREATE CONSTRAINT person_name_exists IF NOT EXISTS
FOR (p:Person) REQUIRE p.name IS NOT NULL;
// 3. Property type constraint (Enterprise) — enforce data type
CREATE CONSTRAINT person_born_integer IF NOT EXISTS
FOR (p:Person) REQUIRE p.born IS :: INTEGER;
// 4. Key constraint (Enterprise) — unique + exists in one
CREATE CONSTRAINT movie_tmdbid_key IF NOT EXISTS
FOR (m:Movie) REQUIRE m.tmdbId IS NODE KEY;
// 5. Range index — equality and range filters on properties
CREATE INDEX person_name_idx IF NOT EXISTS
FOR (p:Person) ON (p.name);
// 6. Fulltext index — CONTAINS, STARTS WITH, free text search
CREATE FULLTEXT INDEX person_fulltext IF NOT EXISTS
FOR (n:Person) ON EACH [n.name, n.bio];
// 7. Vector index — embedding similarity search
CREATE VECTOR INDEX chunk_embedding_idx IF NOT EXISTS
FOR (c:Chunk) ON (c.embedding)
OPTIONS { indexConfig: { `vector.dimensions`: 1536, `vector.similarity_function`: 'cosine' } };
// 8. Relationship index — filter on rel properties
CREATE INDEX acted_in_year_idx IF NOT EXISTS
FOR ()-[r:ACTED_IN]-() ON (r.year);After creating indexes, poll until ONLINE:
SHOW INDEXES YIELD name, state WHERE state <> 'ONLINE' RETURN name, state;Do NOT use an index until state = ONLINE.
Store embeddings on dedicated :Chunk nodes, never on business nodes:
(:Document)-[:HAS_CHUNK]->(c:Chunk {text: "...", embedding: [...]})Rules:
text (source text), embedding (float array), chunkIndex (int)c.embedding only:Document — makes the node too large and pollutes traversal| Anti-pattern | Problem | Fix |
|---|---|---|
Generic labels :Entity, :Node | No filtering benefit; all nodes scan | Use domain labels :Person, :Product |
Generic rel types :RELATED_TO, :HAS | Can't filter by relationship type | Use semantic types :PURCHASED, :AUTHORED |
| Low-cardinality value as node | Supernode (:Status {name:"active"} → millions of edges) | Use label :Active instead |
Property as label (n.type = 'VIP' + :VIP label both exist) | Inconsistency, duplication | Pick one; prefer label if used in traversal |
| Storing embeddings on business node | Node bloat, slow traversal | Dedicated :Chunk node |
| MERGE without uniqueness constraint | Duplicate nodes silently created | Add constraint before any MERGE |
| Missing relationship direction meaning | Arbitrary direction; confusing model | Direction = semantic flow of action |
| Junction table modeled as bare property | Loses history and extensibility | Intermediate node with its own properties |
id as property name | id(n) is a deprecated Cypher function (use elementId(n)); bare id is fine as a property name in practice, but domain-qualified names (personId, movieId) are clearer and avoid any future ambiguity | Prefer personId, movieId, tmdbId where it aids readability |
| All dates as strings | No range queries; no temporal operators | Use Neo4j date() or datetime() type |
When reviewing an existing model:
## Schema Assessment
### Compliant
- [constraint / pattern that is correct]
### Issues Found
#### [Title] — Severity: ERROR / WARNING / INFO
- **Current**: what the model does
- **Problem**: why it is an issue
- **Fix**: specific Cypher DDL or model change
## Recommended Schema
### Node Labels
- :Label {key: TYPE, prop: TYPE, ...} → constraints: [list]
### Relationships
- (:LabelA)-[:TYPE {prop: TYPE}]->(:LabelB)
### Constraints to Create
[CREATE CONSTRAINT ... statements]
### Indexes to Create
[CREATE INDEX ... statements]Severity semantics:
| Severity | Meaning | Action |
|---|---|---|
ERROR | Model correctness failure (duplicates possible, data loss risk) | Stop; fix before proceeding |
WARNING | Performance or extensibility risk | Report; ask user before proceeding |
INFO | Style or convention deviation | Surface; continue |
[official] — stated directly in Neo4j docs[derived] — follows from documented behavior[field] — community heuristic; treat as default but validate:Entity, :Node, :Thing):RELATED_TO, :HAS, :CONNECTED_TO):Chunk nodes, not business nodesIF NOT EXISTSLoad on demand:
66ed0e1
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.