CtrlK
BlogDocsLog inGet started
Tessl Logo

neo4j-query-tuning-skill

Diagnoses and fixes slow Neo4j Cypher queries by reading execution plans, identifying bad operators (AllNodesScan, CartesianProduct, Eager, NodeByLabelScan), and prescribing fixes (indexes, hints, query rewrites, runtime selection). Use when a query is slow, when EXPLAIN or PROFILE output needs interpretation, when dbHits or pageCacheHitRatio are poor, when cardinality estimation diverges from actuals, or when deciding between slotted/pipelined/parallel runtimes. Covers USING INDEX / USING SCAN / USING JOIN hints, db.stats.retrieve, SHOW QUERIES, SHOW TRANSACTIONS, TERMINATE TRANSACTION. Does NOT write new Cypher from scratch — use neo4j-cypher-skill. Does NOT cover GDS algorithm tuning — use neo4j-gds-skill. Does NOT cover index/constraint creation syntax details — use neo4j-cypher-skill references/indexes.md.

76

Quality

96%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

When to Use

  • Query takes unexpectedly long; need root-cause analysis
  • EXPLAIN/PROFILE output in hand — needs interpretation
  • Identifying which index is missing or unused
  • Deciding between slotted / pipelined / parallel runtimes
  • Monitoring live queries: SHOW QUERIES, SHOW TRANSACTIONS
  • Cardinality estimates wrong (plan replanning needed)

When NOT to Use

  • Writing Cypher from scratchneo4j-cypher-skill
  • GDS algorithm performanceneo4j-gds-skill
  • Schema design / data modellingneo4j-modeling-skill

EXPLAIN vs PROFILE

EXPLAINPROFILE
Executes query?NoYes
Returns data?NoYes
Shows rows (actual)NoYes
Shows dbHits (actual)NoYes
Shows estimatedRowsYesYes
CostZeroFull query cost

Run PROFILE twice — first run warms page cache; second gives representative metrics.

EXPLAIN MATCH (p:Person {email: $email}) RETURN p.name
PROFILE MATCH (p:Person {email: $email}) RETURN p.name

Query API alternative (no driver):

curl -X POST https://<host>/db/<db>/query/v2 \
  -u <user>:<pass> -H "Content-Type: application/json" \
  -d '{"statement": "EXPLAIN MATCH (p:Person {email: $email}) RETURN p.name", "parameters": {"email": "a@b.com"}}'

Key Plan Metrics

MetricGoodInvestigate if
dbHitsLow; drops after index addedHigh relative to rows
rowsShrinks early in planLarge until final operator
estimatedRowsClose to rows>10× divergence from actual
pageCacheHitRatio>0.99<0.90 (disk I/O bottleneck)
pageCacheHitsHigh
pageCacheMissesNear 0Rising (page cache too small)

Read plans bottom-up — leaf operators at bottom initiate data retrieval.


Operator Reference

OperatorGood/BadMeaningFix
NodeIndexSeekExact match via RANGE/LOOKUP index
NodeUniqueIndexSeekUnique constraint index hit
NodeIndexContainsScanTEXT index CONTAINS / STARTS WITH
NodeIndexScan~Full index scan (no predicate)Add WHERE predicate or composite index
NodeByLabelScanScans all nodes of labelAdd RANGE index on lookup property
AllNodesScan✗✗Scans entire node storeAdd label + index to MATCH
Expand(All)~Traverse relationships from nodeNormal; limit with LIMIT or WHERE
Expand(Into)~Find rels between two matched nodesNormal for known-endpoint joins
Filter~Predicate applied after scanMove predicate into WHERE with index
CartesianProductNo join predicate between two MATCHAdd WHERE join or use WITH between MATCHes
NodeHashJoin~Hash join on node IDsNormal; planner chose hash join
ValueHashJoin~Hash join on valuesNormal; watch memory for large inputs
EagerAggregation~Full aggregation (ORDER BY, count(*))Normal for aggregates
AggregationStreaming aggregation
EagerRead/write conflict; materialises all rowsSee Eager fix strategies below
Sort~Full sort — O(n log n)Add LIMIT before Sort; push LIMIT earlier
TopSort+Limit combined — O(n log k)Preferred over Sort+Limit
LimitTruncates rows earlyPush as early as possible
Skip~Offset paginationUse keyset pagination on large graphs
ProduceResultsFinal output operatorRoot of tree
UndirectedRelationshipByIdSeekPipe~Lookup by relationship IDAvoid id(r) — use elementId(r)

Full operator reference → references/plan-operators.md


Diagnostic Workflow (Agent Runbook)

Step 1 — Baseline Plan

EXPLAIN <query>

Scan output for AllNodesScan, NodeByLabelScan, CartesianProduct, Eager.

Step 2 — Check Indexes

SHOW INDEXES YIELD name, type, labelsOrTypes, properties, state
WHERE state = 'ONLINE'

Find whether the label/property from the bad operator has an index.

Step 3 — Create Missing Index

// RANGE index for equality/range predicates:
CREATE INDEX person_email IF NOT EXISTS FOR (n:Person) ON (n.email)
// TEXT index for CONTAINS/ENDS WITH:
CREATE TEXT INDEX person_bio IF NOT EXISTS FOR (n:Person) ON (n.bio)
// Composite for multi-property lookup:
CREATE INDEX order_status_date IF NOT EXISTS FOR (n:Order) ON (n.status, n.createdAt)

Wait for state = 'ONLINE' before measuring.

Step 4 — Profile After Fix

PROFILE <query>

Compare dbHits and elapsed ms before/after. Target: NodeIndexSeek replaces scan operators.

Step 5 — Stale Statistics (if estimatedRows wildly off)

CALL db.prepareForReplanning()
// or resample a specific index:
CALL db.resampleIndex("person_email")
// or resample all outdated:
CALL db.resampleOutdatedIndexes()

Config: dbms.cypher.statistics_divergence_threshold (default 0.75 — plan expires when stat changes >75%).


Fixing Common Plan Problems

Missing Index → NodeByLabelScan / AllNodesScan

// Force index hint when planner ignores it:
MATCH (p:Person {email: $email})
USING INDEX p:Person(email)
RETURN p.name
// Force label scan (sometimes faster for high selectivity):
MATCH (p:Person {email: $email})
USING SCAN p:Person
RETURN p.name

Wrong Anchor — Planner Picks Wrong Starting Node

Reorder MATCH or use hints:

// Force join at specific node:
MATCH (a:Author)-[:WROTE]->(b:Book)-[:IN_CATEGORY]->(c:Category {name: $cat})
USING JOIN ON b
RETURN a.name, b.title

CartesianProduct — Two Unconnected MATCHes

// Bad (Cartesian product):
MATCH (a:Author {id: $aid})
MATCH (b:Book  {id: $bid})
RETURN a.name, b.title

// Good (explicit join or WITH):
MATCH (a:Author {id: $aid})-[:WROTE]->(b:Book {id: $bid})
RETURN a.name, b.title
// Or: WITH between them to reset planning context

Eager — Read/Write Conflict

Three strategies (pick simplest):

  1. Add specific labels to MATCH nodes so planner distinguishes read/write sets
  2. Collect-then-write: WITH collect(n) AS nodes UNWIND nodes AS n SET n.x = 1
  3. CALL IN TRANSACTIONS: isolates each batch in its own transaction
CYPHER 25
MATCH (p:Person) WHERE p.score > 100
CALL (p) { SET p.tier = 'gold' } IN TRANSACTIONS OF 1000 ROWS

Expensive CONTAINS / ENDS WITH

// Needs TEXT index (RANGE does NOT support these):
CREATE TEXT INDEX person_bio IF NOT EXISTS FOR (n:Person) ON (n.bio)
MATCH (p:Person) WHERE p.bio CONTAINS $keyword RETURN p.name

Over-Traversal — Push LIMIT Early

// Bad: LIMIT after expensive join
MATCH (a:Author)-[:WROTE]->(b:Book)-[:REVIEWED_BY]->(r:Review)
RETURN a.name, b.title, r.text LIMIT 10

// Good: anchor limit before fan-out
MATCH (a:Author)-[:WROTE]->(b:Book)
WITH a, b LIMIT 10
MATCH (b)-[:REVIEWED_BY]->(r:Review)
RETURN a.name, b.title, r.text

Cypher Runtime Selection

RuntimeSelectBest ForAvoid When
pipelinedCYPHER runtime=pipelinedDefault OLTP; streaming, low memoryUnsupported operators fall back to slotted
slottedCYPHER runtime=slottedGuaranteed stable behavior; debugPerformance-critical OLTP
parallelCYPHER 25 runtime=parallelLarge analytical scans; aggregationsOLTP, writes, short queries, Aura Free

Pipelined is default for most queries. Parallel requires dbms.cypher.parallel.worker_limit configured; available on Enterprise and Aura Pro 2025+.

// Force parallel for large aggregation:
CYPHER 25 runtime=parallel
MATCH (n:Transaction) WHERE n.amount > 1000
RETURN n.currency, count(*), sum(n.amount)

Query Monitoring Commands

// Live queries + resource usage:
SHOW QUERIES YIELD query, queryId, elapsedTimeMillis, allocatedBytes, status, username

// Running transactions:
SHOW TRANSACTIONS YIELD transactionId, currentQuery, currentQueryProgress, elapsedTime, status, username, cpuTime, activeLockCount  // currentQueryProgress added [2026.03]

// Kill a specific transaction:
TERMINATE TRANSACTION $transactionId

// Kill a query:
TERMINATE QUERY $queryId

// Graph count stats (node/rel counts by label/type — feed into planner):
CALL db.stats.retrieve('GRAPH COUNTS') YIELD section, data RETURN section, data

// Token stats (label/property/rel-type IDs):
CALL db.stats.retrieve('TOKENS') YIELD section, data RETURN section, data

Full monitoring reference → references/stats-and-monitoring.md


Checklist

  • Run EXPLAIN first — identifies plan problems without execution cost
  • Check for AllNodesScan / NodeByLabelScan — missing index
  • Check for CartesianProduct — missing join predicate
  • Check for Eager — read/write conflict
  • SHOW INDEXES — confirm relevant index exists and state = 'ONLINE'
  • Create missing index; wait for ONLINE
  • Run PROFILE twice — first warms cache, second is representative
  • Compare dbHits before/after fix
  • If estimatedRows wildly off → CALL db.prepareForReplanning()
  • Push LIMIT / WITH n LIMIT k before high-fanout operations
  • For CONTAINS/ENDS WITH — TEXT index, not RANGE
  • For large analytical queries — consider runtime=parallel
  • Kill long-running queries with TERMINATE TRANSACTION
Repository
neo4j-contrib/neo4j-skills
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.