neo4j-import-skill

Import structured data into Neo4j — LOAD CSV, CALL IN TRANSACTIONS, neo4j-admin database import full (offline bulk), apoc.load.csv/json, apoc.periodic.iterate, driver batch writes. Covers method selection, header file format, type coercion, null handling, ON ERROR modes, CONCURRENT TRANSACTIONS, pre-import constraint setup, and post-import validation. Use when importing CSV/JSON/Parquet files, migrating relational data to graph, or bulk-loading large datasets. Does NOT handle unstructured document/PDF/vector chunking pipelines — use neo4j-document-import-skill. Does NOT handle live app write patterns (MERGE/CREATE) — use neo4j-cypher-skill. Does NOT handle neo4j-admin backup/restore/config — use neo4j-cli-tools-skill.

1.26x

Quality

88%

Does it follow best practices?

Impact

96%

1.26x

Average score across 3 eval scenarios

Securityby

Advisory

Suggest reviewing before use

Neo4j Import Skill

When to Use

Importing CSV, JSON, or Parquet files into Neo4j
Batch-upserting nodes and relationships (UNWIND + CALL IN TRANSACTIONS)
Migrating relational data (SQL → graph)
Bulk-loading large datasets offline (neo4j-admin import)
Choosing between online (Cypher) and offline (admin) import methods
Verifying import completeness (counts, constraints, index states)

When NOT to Use

Unstructured docs, PDFs, vector chunks → neo4j-document-import-skill
Live application writes (MERGE/CREATE in app code) → neo4j-cypher-skill
neo4j-admin backup/restore/config → neo4j-cli-tools-skill
GDS algorithm projection from existing graph → neo4j-gds-skill

Method Decision Table

Dataset size	DB state	Source	Method
Any size	Online	CSV (Aura or local)	LOAD CSV + CALL IN TRANSACTIONS
< 1M rows	Online	List/API response	UNWIND + CALL IN TRANSACTIONS
> 10M rows	Offline (local/self-managed)	CSV / Parquet	`neo4j-admin database import full`
Any size	Online	APOC available	`apoc.periodic.iterate` + `apoc.load.csv`
Any size	Online	JSON/API	`apoc.load.json` or driver batching
Incremental delta	Offline (Enterprise)	CSV	`neo4j-admin database import incremental`

Aura: only https:// URLs — no file:///. Use neo4j-admin import only on self-managed.

Pre-Import Checklist

Run in this exact order — skipping causes hard-to-debug duplicates or missed index usage:

Constraints BEFORE import. Additional indexes AFTER import.

Constraints create implicit RANGE indexes used by MERGE during load + enforce uniqueness
Additional non-unique indexes (TEXT, RANGE on non-key props, FULLTEXT) created after load — Neo4j populates them async from the committed data; poll populationPercent until 100%
Creating extra indexes before import slows every write during load with no benefit

Create uniqueness constraints (enables index used by MERGE):
```
CREATE CONSTRAINT IF NOT EXISTS FOR (n:Person) REQUIRE n.id IS UNIQUE;
CREATE CONSTRAINT IF NOT EXISTS FOR (n:Movie)  REQUIRE n.movieId IS UNIQUE;
```
Neo4j 2026.02+ (Enterprise/Aura) — PREVIEW: ALTER CURRENT GRAPH TYPE SET { … } can replace all individual constraint statements with a single declarative block. See neo4j-cypher-skill/references/graph-type.md. Use individual CREATE CONSTRAINT on older versions or Community Edition.
Verify APOC if using apoc. procedures*:
```
RETURN apoc.version();
```
If fails → APOC not installed. Use plain LOAD CSV instead.
Confirm target is PRIMARY (not replica):
```
CALL dbms.cluster.role() YIELD role RETURN role;
```
If role ≠ PRIMARY → stop. Redirect write to PRIMARY endpoint.
Count source file rows before import (catch encoding issues early):
```
wc -l data/persons.csv    # Linux/macOS
```

Verify UTF-8 encoding — LOAD CSV requires UTF-8. Re-encode if needed:

file -i persons.csv       # Check encoding
iconv -f latin1 -t utf-8 persons.csv > persons_utf8.csv

LOAD CSV Patterns

Basic node import with type coercion and null handling

CYPHER 25
LOAD CSV WITH HEADERS FROM 'file:///persons.csv' AS row
CALL (row) {
  MERGE (p:Person {id: row.id})
  ON CREATE SET
    p.name       = row.name,
    p.age        = toIntegerOrNull(row.age),
    p.score      = toFloatOrNull(row.score),
    p.active     = toBoolean(row.active),
    p.born       = CASE WHEN row.born IS NOT NULL AND row.born <> '' THEN date(row.born) ELSE null END,
    p.createdAt  = datetime()
  ON MATCH SET
    p.updatedAt  = datetime()
} IN TRANSACTIONS OF 10000 ROWS
  ON ERROR CONTINUE
  REPORT STATUS AS s
RETURN s.transactionId, s.committed, s.errorMessage

Null/empty-string rules:

CSV missing column → null (safe)
CSV empty string "" → stored as "" not null — use nullIf(row.x, '') to convert
toInteger(null) throws → always use toIntegerOrNull()
toFloat(null) throws → always use toFloatOrNull()
Neo4j never stores null properties — they are silently dropped on SET

Relationship import (nodes must exist first)

CYPHER 25
LOAD CSV WITH HEADERS FROM 'file:///knows.csv' AS row
CALL (row) {
  MATCH (a:Person {id: row.fromId})
  MATCH (b:Person {id: row.toId})
  MERGE (a)-[:KNOWS {since: toIntegerOrNull(row.year)}]->(b)
} IN TRANSACTIONS OF 5000 ROWS
  ON ERROR CONTINUE
  REPORT STATUS AS s

Always import ALL nodes before ANY relationships — MATCH fails on missing nodes.

Tab-separated or custom delimiter

CYPHER 25
LOAD CSV WITH HEADERS FROM 'file:///data.tsv' AS row FIELDTERMINATOR '\t'
CALL (row) { MERGE (p:Person {id: row.id}) }
IN TRANSACTIONS OF 10000 ROWS ON ERROR CONTINUE

Compressed files (ZIP / gzip — local files only)

LOAD CSV WITH HEADERS FROM 'file:///archive.csv.gz' AS row ...

Cloud storage (Enterprise Edition)

Scheme	Example
AWS S3	`s3://my-bucket/data/persons.csv`
Google Cloud Storage	`gs://my-bucket/persons.csv`
Azure Blob	`azb://account/container/persons.csv`

Useful built-in functions inside LOAD CSV

linenumber()   // current line number — use as fallback ID
file()         // absolute path of file being loaded

CALL IN TRANSACTIONS — Full Reference

Syntax

CALL (row) {
  // write logic
} IN [n CONCURRENT] TRANSACTIONS
  [OF batchSize ROW[S]]
  [ON ERROR {CONTINUE | BREAK | FAIL | RETRY [FOR duration SECONDS] [THEN {CONTINUE|BREAK|FAIL}]}]
  [REPORT STATUS AS statusVar]

ON ERROR modes

Mode	Behavior	Use when
`ON ERROR FAIL`	Default. Rolls back entire outer tx on first error	All-or-nothing strict import
`ON ERROR CONTINUE`	Skips failed batch, continues remaining batches	Resilient bulk load — track errors via REPORT STATUS
`ON ERROR BREAK`	Stops after first failed batch; keeps completed work	Semi-strict: stop early, keep successful batches
`ON ERROR RETRY`	Exponential backoff retry (default 30s) + fallback	Concurrent writes with deadlock risk

ON ERROR CONTINUE/BREAK → outer transaction succeeds even if inner batches fail. ON ERROR FAIL → cannot be combined with REPORT STATUS AS.

CONCURRENT TRANSACTIONS (parallel batches)

CYPHER 25
LOAD CSV WITH HEADERS FROM 'file:///large.csv' AS row
CALL (row) {
  MERGE (p:Person {id: row.id}) SET p.name = row.name
} IN 4 CONCURRENT TRANSACTIONS OF 5000 ROWS
  ON ERROR RETRY FOR 30 SECONDS THEN CONTINUE
  REPORT STATUS AS s

Use CONCURRENT for read-heavy MERGE on non-overlapping key spaces. Risk: deadlocks on overlapping writes → combine with ON ERROR RETRY.

REPORT STATUS columns

Column	Type	Meaning
`s.started`	BOOLEAN	Batch transaction started
`s.committed`	BOOLEAN	Batch committed successfully
`s.transactionId`	STRING	Transaction ID
`s.errorMessage`	STRING or null	Error detail if batch failed

Batch size guidance

Row count	Recommended batch size	Notes
< 100k	10 000	Default is fine
100k – 1M	10 000 – 50 000	Monitor heap; increase if fast
1M – 10M	50 000 – 100 000	Enable CONCURRENT if CPUs available
> 10M online	50 000	Consider neo4j-admin import instead
Relationship import	5 000	Lower — each batch does 2x MATCH

neo4j-admin import (Offline Bulk Load)

Fastest method: ~3 min for 31M nodes / 78M rels on SSD. DB must be stopped or non-existent.

Command structure

neo4j-admin database import full \
  --nodes=Person="persons_header.csv,persons.csv" \
  --nodes=Movie="movies_header.csv,movies.csv" \
  --relationships=ACTED_IN="acted_in_header.csv,acted_in.csv" \
  --relationships=DIRECTED="directed_header.csv,directed.csv" \
  --delimiter=, \
  --id-type=STRING \
  --bad-tolerance=0 \
  --threads=$(nproc) \
  --high-parallel-io=on \
  neo4j

For SSDs: always set --high-parallel-io=on. For large graphs (>34B nodes/rels): --format=block.

Dry run (2026.02+) — validate without writing:

neo4j-admin database import full --dry-run ...

Node header file format

# persons_header.csv
personId:ID,name,born:int,score:float,active:boolean,:LABEL

# persons.csv (data file — no header row)
p001,Alice,1985,9.2,true,Person
p002,Bob,1990,7.1,false,Person

Field	Meaning
`:ID`	Unique ID for relationship wiring (not stored as property by default)
`:ID(Group)`	Scoped ID space — use when node types share IDs
`:LABEL`	One or more labels; semicolon-separated: `Person;Employee`
`prop:int`	Typed property; types: `int long float double boolean byte short string`
`prop:date`	Temporal: `date localtime time localdatetime datetime duration`
`prop:int[]`	Array — semicolon-separated values in cell: `1;2;3`
`prop:vector`	Float vector (2025.10+) — semicolon-separated coordinates

Relationship header file format

# acted_in_header.csv
:START_ID(Person),:END_ID(Movie),role,:TYPE

# acted_in.csv
p001,tt0133093,Neo,ACTED_IN
p002,tt0133093,Morpheus,ACTED_IN

:START_ID / :END_ID must reference the same :ID group as the node files.

Key flags

Flag	Default	Notes
`--delimiter`	`,`	Single char or `TAB`
`--id-type`	`STRING`	`STRING \| INTEGER \| ACTUAL`
`--bad-tolerance`	`-1` (unlimited, changed 2025.12)	Set `0` for strict prod imports
`--threads`	CPU count	Set explicitly on shared hosts
`--max-off-heap-memory`	90% RAM	Reduce if other services share host
`--high-parallel-io`	`off`	Set `on` for SSD/NVMe
`--format`	`standard`	`block` for >34B nodes/rels
`--overwrite-destination`	false	Required if DB already exists
`--dry-run`	false	2026.02+ — validate without writing

Schema file (--schema) [Enterprise, block format]

Pass a Cypher file with CREATE CONSTRAINT / CREATE INDEX statements; executed automatically after import completes. Constraints are created first (correct order enforced). File paths can be local or remote (s3://, gs://, https://).

neo4j-admin database import full \
  --format=block \
  --schema=schema.cypher \
  --nodes=Person="persons_header.csv,persons.csv" \
  neo4j

// schema.cypher
CREATE CONSTRAINT person_id IF NOT EXISTS FOR (n:Person) REQUIRE n.id IS UNIQUE;
CREATE CONSTRAINT movie_id  IF NOT EXISTS FOR (n:Movie)  REQUIRE n.id IS UNIQUE;
CREATE RANGE INDEX person_email IF NOT EXISTS FOR (n:Person) ON (n.email);
CREATE TEXT  INDEX movie_title  IF NOT EXISTS FOR (n:Movie)  ON (n.title);

For incremental import, DROP CONSTRAINT / DROP INDEX are also supported [2025.02+] — used to remove indexes before the merge phase and recreate them after for faster writes.

Incremental import (Enterprise only)

Three-phase process — use when DB must stay online during import preparation:

# Phase 1: Prepare staging area
neo4j-admin database import incremental --stage=prepare \
  --nodes=Person=persons_header.csv,delta.csv --force neo4j

# Phase 2: Build indexes (DB can be read-only during this phase)
neo4j-admin database import incremental --stage=build neo4j

# Phase 3: Merge into live database (brief write-lock)
neo4j-admin database import incremental --stage=merge neo4j

Requires Enterprise Edition + block store format.

APOC Patterns (when APOC is available)

Verify first: RETURN apoc.version() — if fails, use LOAD CSV or driver instead.

apoc.periodic.iterate — batch-process existing graph data

CALL apoc.periodic.iterate(
  "MATCH (p:Person) WHERE NOT (p)-[:HAS_ACCOUNT]->() RETURN p",
  "CREATE (p)-[:HAS_ACCOUNT]->(a:Account {id: randomUUID()})",
  {batchSize: 10000, parallel: false, retries: 2}
) YIELD batches, total, errorMessages
RETURN batches, total, errorMessages

Config key	Default	Notes
`batchSize`	10000	Rows per inner transaction
`parallel`	false	Enable for non-overlapping writes; risk: deadlocks
`retries`	0	Retry failed batches N times with 100ms delay

Prefer CALL IN TRANSACTIONS (native Cypher) over apoc.periodic.iterate for new code — it has REPORT STATUS, CONCURRENT, and RETRY built in without APOC dependency.

apoc.load.csv — load with config options

CALL apoc.load.csv('file:///persons.csv', {
  header: true,
  sep: ',',
  skip: 1,
  limit: 1000000
}) YIELD lineNo, map, list
CALL (map) {
  MERGE (p:Person {id: map.id}) SET p.name = map.name
} IN TRANSACTIONS OF 10000 ROWS ON ERROR CONTINUE

apoc.load.json — load JSON from file or URL

CALL apoc.load.json('https://api.example.com/persons') YIELD value
CALL (value) {
  MERGE (p:Person {id: value.id}) SET p.name = value.name
} IN TRANSACTIONS OF 1000 ROWS ON ERROR CONTINUE

Driver Batch Write Pattern

Use when source is not a file (API responses, DB migrations). Collect into BATCH_SIZE (10 000) lists, call UNWIND $rows AS row MERGE ... per batch. ~10x faster than row-at-a-time. → Python + JS examples

MCP Tool Usage

Operation	MCP tool	Notes
`SHOW CONSTRAINTS`, `SHOW INDEXES`	`read-cypher`	Always inspect before import
`CREATE CONSTRAINT`, `CREATE INDEX`	`write-cypher`	Gate: show planned constraint, confirm
LOAD CSV / CALL IN TRANSACTIONS	`write-cypher`	Gate: show row count + Cypher, confirm
Verify counts	`read-cypher`	Post-import: `MATCH (n:Label) RETURN count(n)`
Poll index state	`read-cypher`	Poll until all `state = 'ONLINE'`

Write gate — before any bulk write via MCP, show:

Query + affected labels
Estimated row count from source
EXPLAIN plan

Wait for user confirmation. Never auto-execute CALL IN TRANSACTIONS or CREATE CONSTRAINT without confirmation.

Always pass database param if not default: {"code": "...", "database": "neo4j"}.

Common Errors

Error	Cause	Fix
`Couldn't load the external resource`	`file:///` path not in Neo4j import dir	Move file to `$NEO4J_HOME/import/`; check `dbms.security.allow_csv_import_from_file_urls=true`
`Cannot merge node using null property value`	MERGE key resolved to null	Validate `row.id IS NOT NULL` before MERGE; add `WHERE row.id IS NOT NULL`
`toInteger() called on null`	Null column fed to non-null-safe fn	Replace `toInteger()` → `toIntegerOrNull()`, `toFloat()` → `toFloatOrNull()`
`Node N already exists` / constraint violation mid-import	Duplicate source IDs	Dedup source CSV; use `MERGE` not `CREATE`; add `IF NOT EXISTS` to constraint
Heap overflow / OutOfMemoryError	Batch too large or file too large	Reduce batch size; switch to `CALL IN TRANSACTIONS`; neo4j-admin for offline
`Invalid input 'IN': expected...'`	`PERIODIC COMMIT` used	Replace `USING PERIODIC COMMIT` → `CALL IN TRANSACTIONS` — PERIODIC COMMIT removed in Cypher 25
neo4j-admin: `Bad input data`	Wrong header format or type mismatch	Check `:ID`, `:START_ID`, `:END_ID` present; check typed columns parse correctly
neo4j-admin: import fails silently	`--bad-tolerance` default was unlimited pre-2025.12	Set `--bad-tolerance=0` to surface all errors
Index not used during MERGE	Constraint not created before import	Drop data, create constraint, re-import
Relationship import missing nodes	Relationships imported before nodes	Always import ALL node files before ANY relationship files

Post-Import Validation

After import completes — run all:

// Row counts per label
MATCH (n:Person) RETURN count(n) AS persons;
MATCH ()-[:KNOWS]->() RETURN count(*) AS knows_rels;

// After import: create additional non-unique indexes (populated async)
CREATE TEXT INDEX movie_title IF NOT EXISTS FOR (n:Movie) ON (n.title);
CREATE RANGE INDEX person_born IF NOT EXISTS FOR (n:Person) ON (n.born);

// Poll population — wait until populationPercent = 100 before opening to queries
SHOW INDEXES YIELD name, state, populationPercent
WHERE state <> 'ONLINE' OR populationPercent < 100
RETURN name, state, populationPercent
ORDER BY populationPercent;

// Spot check: null keys = import bug
MATCH (p:Person) WHERE p.id IS NULL RETURN count(p) AS missing_id;

Do NOT run production queries until all indexes are ONLINE.

References

Checklist

Repository: neo4j-contrib/neo4j-skills
Commit: 66ed0e1

Last updated: 1 day ago
Created: 1 day ago

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.