neo4j-query-tuning-skill

Diagnoses and fixes slow Neo4j Cypher queries by reading execution plans, identifying bad operators (AllNodesScan, CartesianProduct, Eager, NodeByLabelScan), and prescribing fixes (indexes, hints, query rewrites, runtime selection). Use when a query is slow, when EXPLAIN or PROFILE output needs interpretation, when dbHits or pageCacheHitRatio are poor, when cardinality estimation diverges from actuals, or when deciding between slotted/pipelined/parallel runtimes. Covers USING INDEX / USING SCAN / USING JOIN hints, db.stats.retrieve, SHOW QUERIES, SHOW TRANSACTIONS, TERMINATE TRANSACTION. Does NOT write new Cypher from scratch — use neo4j-cypher-skill. Does NOT cover GDS algorithm tuning — use neo4j-gds-skill. Does NOT cover index/constraint creation syntax details — use neo4j-cypher-skill references/indexes.md.

Quality

96%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

When to Use

Query takes unexpectedly long; need root-cause analysis
EXPLAIN/PROFILE output in hand — needs interpretation
Identifying which index is missing or unused
Deciding between slotted / pipelined / parallel runtimes
Monitoring live queries: SHOW QUERIES, SHOW TRANSACTIONS
Cardinality estimates wrong (plan replanning needed)

When NOT to Use

Writing Cypher from scratch → neo4j-cypher-skill
GDS algorithm performance → neo4j-gds-skill
Schema design / data modelling → neo4j-modeling-skill

EXPLAIN vs PROFILE

	EXPLAIN	PROFILE
Executes query?	No	Yes
Returns data?	No	Yes
Shows `rows` (actual)	No	Yes
Shows `dbHits` (actual)	No	Yes
Shows `estimatedRows`	Yes	Yes
Cost	Zero	Full query cost

Run PROFILE twice — first run warms page cache; second gives representative metrics.

EXPLAIN MATCH (p:Person {email: $email}) RETURN p.name
PROFILE MATCH (p:Person {email: $email}) RETURN p.name

Query API alternative (no driver):

curl -X POST https://<host>/db/<db>/query/v2 \
  -u <user>:<pass> -H "Content-Type: application/json" \
  -d '{"statement": "EXPLAIN MATCH (p:Person {email: $email}) RETURN p.name", "parameters": {"email": "a@b.com"}}'

Key Plan Metrics

Metric	Good	Investigate if
`dbHits`	Low; drops after index added	High relative to `rows`
`rows`	Shrinks early in plan	Large until final operator
`estimatedRows`	Close to `rows`	>10× divergence from actual
`pageCacheHitRatio`	>0.99	<0.90 (disk I/O bottleneck)
`pageCacheHits`	High	—
`pageCacheMisses`	Near 0	Rising (page cache too small)

Read plans bottom-up — leaf operators at bottom initiate data retrieval.

Operator Reference

Operator	Good/Bad	Meaning	Fix
`NodeIndexSeek`	✓	Exact match via RANGE/LOOKUP index	—
`NodeUniqueIndexSeek`	✓	Unique constraint index hit	—
`NodeIndexContainsScan`	✓	TEXT index CONTAINS / STARTS WITH	—
`NodeIndexScan`	~	Full index scan (no predicate)	Add WHERE predicate or composite index
`NodeByLabelScan`	✗	Scans all nodes of label	Add RANGE index on lookup property
`AllNodesScan`	✗✗	Scans entire node store	Add label + index to MATCH
`Expand(All)`	~	Traverse relationships from node	Normal; limit with LIMIT or WHERE
`Expand(Into)`	~	Find rels between two matched nodes	Normal for known-endpoint joins
`Filter`	~	Predicate applied after scan	Move predicate into WHERE with index
`CartesianProduct`	✗	No join predicate between two MATCH	Add WHERE join or use WITH between MATCHes
`NodeHashJoin`	~	Hash join on node IDs	Normal; planner chose hash join
`ValueHashJoin`	~	Hash join on values	Normal; watch memory for large inputs
`EagerAggregation`	~	Full aggregation (ORDER BY, count(*))	Normal for aggregates
`Aggregation`	✓	Streaming aggregation	—
`Eager`	✗	Read/write conflict; materialises all rows	See Eager fix strategies below
`Sort`	~	Full sort — O(n log n)	Add `LIMIT` before Sort; push LIMIT earlier
`Top`	✓	Sort+Limit combined — O(n log k)	Preferred over Sort+Limit
`Limit`	✓	Truncates rows early	Push as early as possible
`Skip`	~	Offset pagination	Use keyset pagination on large graphs
`ProduceResults`	—	Final output operator	Root of tree
`UndirectedRelationshipByIdSeekPipe`	~	Lookup by relationship ID	Avoid `id(r)` — use `elementId(r)`

Full operator reference → references/plan-operators.md

Diagnostic Workflow (Agent Runbook)

Step 1 — Baseline Plan

EXPLAIN <query>

Scan output for AllNodesScan, NodeByLabelScan, CartesianProduct, Eager.

Step 2 — Check Indexes

SHOW INDEXES YIELD name, type, labelsOrTypes, properties, state
WHERE state = 'ONLINE'

Find whether the label/property from the bad operator has an index.

Step 3 — Create Missing Index

// RANGE index for equality/range predicates:
CREATE INDEX person_email IF NOT EXISTS FOR (n:Person) ON (n.email)
// TEXT index for CONTAINS/ENDS WITH:
CREATE TEXT INDEX person_bio IF NOT EXISTS FOR (n:Person) ON (n.bio)
// Composite for multi-property lookup:
CREATE INDEX order_status_date IF NOT EXISTS FOR (n:Order) ON (n.status, n.createdAt)

Wait for state = 'ONLINE' before measuring.

Step 4 — Profile After Fix

PROFILE <query>

Compare dbHits and elapsed ms before/after. Target: NodeIndexSeek replaces scan operators.

Step 5 — Stale Statistics (if estimatedRows wildly off)

CALL db.prepareForReplanning()
// or resample a specific index:
CALL db.resampleIndex("person_email")
// or resample all outdated:
CALL db.resampleOutdatedIndexes()

Config: dbms.cypher.statistics_divergence_threshold (default 0.75 — plan expires when stat changes >75%).

Fixing Common Plan Problems

Missing Index → NodeByLabelScan / AllNodesScan

// Force index hint when planner ignores it:
MATCH (p:Person {email: $email})
USING INDEX p:Person(email)
RETURN p.name
// Force label scan (sometimes faster for high selectivity):
MATCH (p:Person {email: $email})
USING SCAN p:Person
RETURN p.name

Wrong Anchor — Planner Picks Wrong Starting Node

Reorder MATCH or use hints:

// Force join at specific node:
MATCH (a:Author)-[:WROTE]->(b:Book)-[:IN_CATEGORY]->(c:Category {name: $cat})
USING JOIN ON b
RETURN a.name, b.title

CartesianProduct — Two Unconnected MATCHes

// Bad (Cartesian product):
MATCH (a:Author {id: $aid})
MATCH (b:Book  {id: $bid})
RETURN a.name, b.title

// Good (explicit join or WITH):
MATCH (a:Author {id: $aid})-[:WROTE]->(b:Book {id: $bid})
RETURN a.name, b.title
// Or: WITH between them to reset planning context

Eager — Read/Write Conflict

Three strategies (pick simplest):

Add specific labels to MATCH nodes so planner distinguishes read/write sets
Collect-then-write: WITH collect(n) AS nodes UNWIND nodes AS n SET n.x = 1
CALL IN TRANSACTIONS: isolates each batch in its own transaction

CYPHER 25
MATCH (p:Person) WHERE p.score > 100
CALL (p) { SET p.tier = 'gold' } IN TRANSACTIONS OF 1000 ROWS

Expensive CONTAINS / ENDS WITH

// Needs TEXT index (RANGE does NOT support these):
CREATE TEXT INDEX person_bio IF NOT EXISTS FOR (n:Person) ON (n.bio)
MATCH (p:Person) WHERE p.bio CONTAINS $keyword RETURN p.name

Over-Traversal — Push LIMIT Early

// Bad: LIMIT after expensive join
MATCH (a:Author)-[:WROTE]->(b:Book)-[:REVIEWED_BY]->(r:Review)
RETURN a.name, b.title, r.text LIMIT 10

// Good: anchor limit before fan-out
MATCH (a:Author)-[:WROTE]->(b:Book)
WITH a, b LIMIT 10
MATCH (b)-[:REVIEWED_BY]->(r:Review)
RETURN a.name, b.title, r.text

Cypher Runtime Selection

Runtime	Select	Best For	Avoid When
`pipelined`	`CYPHER runtime=pipelined`	Default OLTP; streaming, low memory	Unsupported operators fall back to slotted
`slotted`	`CYPHER runtime=slotted`	Guaranteed stable behavior; debug	Performance-critical OLTP
`parallel`	`CYPHER 25 runtime=parallel`	Large analytical scans; aggregations	OLTP, writes, short queries, Aura Free

Pipelined is default for most queries. Parallel requires dbms.cypher.parallel.worker_limit configured; available on Enterprise and Aura Pro 2025+.

// Force parallel for large aggregation:
CYPHER 25 runtime=parallel
MATCH (n:Transaction) WHERE n.amount > 1000
RETURN n.currency, count(*), sum(n.amount)

Query Monitoring Commands

// Live queries + resource usage:
SHOW QUERIES YIELD query, queryId, elapsedTimeMillis, allocatedBytes, status, username

// Running transactions:
SHOW TRANSACTIONS YIELD transactionId, currentQuery, currentQueryProgress, elapsedTime, status, username, cpuTime, activeLockCount  // currentQueryProgress added [2026.03]

// Kill a specific transaction:
TERMINATE TRANSACTION $transactionId

// Kill a query:
TERMINATE QUERY $queryId

// Graph count stats (node/rel counts by label/type — feed into planner):
CALL db.stats.retrieve('GRAPH COUNTS') YIELD section, data RETURN section, data

// Token stats (label/property/rel-type IDs):
CALL db.stats.retrieve('TOKENS') YIELD section, data RETURN section, data

Full monitoring reference → references/stats-and-monitoring.md

Checklist

Repository: neo4j-contrib/neo4j-skills
Commit: 66ed0e1

Last updated: 1 day ago
Created: 1 day ago

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.