CtrlK
BlogDocsLog inGet started
Tessl Logo

neo4j-import-skill

Import structured data into Neo4j — LOAD CSV, CALL IN TRANSACTIONS, neo4j-admin database import full (offline bulk), apoc.load.csv/json, apoc.periodic.iterate, driver batch writes. Covers method selection, header file format, type coercion, null handling, ON ERROR modes, CONCURRENT TRANSACTIONS, pre-import constraint setup, and post-import validation. Use when importing CSV/JSON/Parquet files, migrating relational data to graph, or bulk-loading large datasets. Does NOT handle unstructured document/PDF/vector chunking pipelines — use neo4j-document-import-skill. Does NOT handle live app write patterns (MERGE/CREATE) — use neo4j-cypher-skill. Does NOT handle neo4j-admin backup/restore/config — use neo4j-cli-tools-skill.

90

1.26x
Quality

88%

Does it follow best practices?

Impact

96%

1.26x

Average score across 3 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

SKILL.md
Quality
Evals
Security

Neo4j Import Skill

When to Use

  • Importing CSV, JSON, or Parquet files into Neo4j
  • Batch-upserting nodes and relationships (UNWIND + CALL IN TRANSACTIONS)
  • Migrating relational data (SQL → graph)
  • Bulk-loading large datasets offline (neo4j-admin import)
  • Choosing between online (Cypher) and offline (admin) import methods
  • Verifying import completeness (counts, constraints, index states)

When NOT to Use

  • Unstructured docs, PDFs, vector chunksneo4j-document-import-skill
  • Live application writes (MERGE/CREATE in app code)neo4j-cypher-skill
  • neo4j-admin backup/restore/configneo4j-cli-tools-skill
  • GDS algorithm projection from existing graphneo4j-gds-skill

Method Decision Table

Dataset sizeDB stateSourceMethod
Any sizeOnlineCSV (Aura or local)LOAD CSV + CALL IN TRANSACTIONS
< 1M rowsOnlineList/API responseUNWIND + CALL IN TRANSACTIONS
> 10M rowsOffline (local/self-managed)CSV / Parquetneo4j-admin database import full
Any sizeOnlineAPOC availableapoc.periodic.iterate + apoc.load.csv
Any sizeOnlineJSON/APIapoc.load.json or driver batching
Incremental deltaOffline (Enterprise)CSVneo4j-admin database import incremental

Aura: only https:// URLs — no file:///. Use neo4j-admin import only on self-managed.


Pre-Import Checklist

Run in this exact order — skipping causes hard-to-debug duplicates or missed index usage:

Constraints BEFORE import. Additional indexes AFTER import.

  • Constraints create implicit RANGE indexes used by MERGE during load + enforce uniqueness
  • Additional non-unique indexes (TEXT, RANGE on non-key props, FULLTEXT) created after load — Neo4j populates them async from the committed data; poll populationPercent until 100%
  • Creating extra indexes before import slows every write during load with no benefit
  1. Create uniqueness constraints (enables index used by MERGE):

    CREATE CONSTRAINT IF NOT EXISTS FOR (n:Person) REQUIRE n.id IS UNIQUE;
    CREATE CONSTRAINT IF NOT EXISTS FOR (n:Movie)  REQUIRE n.movieId IS UNIQUE;

    Neo4j 2026.02+ (Enterprise/Aura) — PREVIEW: ALTER CURRENT GRAPH TYPE SET { … } can replace all individual constraint statements with a single declarative block. See neo4j-cypher-skill/references/graph-type.md. Use individual CREATE CONSTRAINT on older versions or Community Edition.

  2. Verify APOC if using apoc. procedures*:

    RETURN apoc.version();

    If fails → APOC not installed. Use plain LOAD CSV instead.

  3. Confirm target is PRIMARY (not replica):

    CALL dbms.cluster.role() YIELD role RETURN role;

    If role ≠ PRIMARY → stop. Redirect write to PRIMARY endpoint.

  4. Count source file rows before import (catch encoding issues early):

    wc -l data/persons.csv    # Linux/macOS
  5. Verify UTF-8 encoding — LOAD CSV requires UTF-8. Re-encode if needed:

    file -i persons.csv       # Check encoding
    iconv -f latin1 -t utf-8 persons.csv > persons_utf8.csv

LOAD CSV Patterns

Basic node import with type coercion and null handling

CYPHER 25
LOAD CSV WITH HEADERS FROM 'file:///persons.csv' AS row
CALL (row) {
  MERGE (p:Person {id: row.id})
  ON CREATE SET
    p.name       = row.name,
    p.age        = toIntegerOrNull(row.age),
    p.score      = toFloatOrNull(row.score),
    p.active     = toBoolean(row.active),
    p.born       = CASE WHEN row.born IS NOT NULL AND row.born <> '' THEN date(row.born) ELSE null END,
    p.createdAt  = datetime()
  ON MATCH SET
    p.updatedAt  = datetime()
} IN TRANSACTIONS OF 10000 ROWS
  ON ERROR CONTINUE
  REPORT STATUS AS s
RETURN s.transactionId, s.committed, s.errorMessage

Null/empty-string rules:

  • CSV missing column → null (safe)
  • CSV empty string "" → stored as "" not null — use nullIf(row.x, '') to convert
  • toInteger(null) throws → always use toIntegerOrNull()
  • toFloat(null) throws → always use toFloatOrNull()
  • Neo4j never stores null properties — they are silently dropped on SET

Relationship import (nodes must exist first)

CYPHER 25
LOAD CSV WITH HEADERS FROM 'file:///knows.csv' AS row
CALL (row) {
  MATCH (a:Person {id: row.fromId})
  MATCH (b:Person {id: row.toId})
  MERGE (a)-[:KNOWS {since: toIntegerOrNull(row.year)}]->(b)
} IN TRANSACTIONS OF 5000 ROWS
  ON ERROR CONTINUE
  REPORT STATUS AS s

Always import ALL nodes before ANY relationships — MATCH fails on missing nodes.

Tab-separated or custom delimiter

CYPHER 25
LOAD CSV WITH HEADERS FROM 'file:///data.tsv' AS row FIELDTERMINATOR '\t'
CALL (row) { MERGE (p:Person {id: row.id}) }
IN TRANSACTIONS OF 10000 ROWS ON ERROR CONTINUE

Compressed files (ZIP / gzip — local files only)

LOAD CSV WITH HEADERS FROM 'file:///archive.csv.gz' AS row ...

Cloud storage (Enterprise Edition)

SchemeExample
AWS S3s3://my-bucket/data/persons.csv
Google Cloud Storagegs://my-bucket/persons.csv
Azure Blobazb://account/container/persons.csv

Useful built-in functions inside LOAD CSV

linenumber()   // current line number — use as fallback ID
file()         // absolute path of file being loaded

CALL IN TRANSACTIONS — Full Reference

Syntax

CALL (row) {
  // write logic
} IN [n CONCURRENT] TRANSACTIONS
  [OF batchSize ROW[S]]
  [ON ERROR {CONTINUE | BREAK | FAIL | RETRY [FOR duration SECONDS] [THEN {CONTINUE|BREAK|FAIL}]}]
  [REPORT STATUS AS statusVar]

ON ERROR modes

ModeBehaviorUse when
ON ERROR FAILDefault. Rolls back entire outer tx on first errorAll-or-nothing strict import
ON ERROR CONTINUESkips failed batch, continues remaining batchesResilient bulk load — track errors via REPORT STATUS
ON ERROR BREAKStops after first failed batch; keeps completed workSemi-strict: stop early, keep successful batches
ON ERROR RETRYExponential backoff retry (default 30s) + fallbackConcurrent writes with deadlock risk

ON ERROR CONTINUE/BREAK → outer transaction succeeds even if inner batches fail. ON ERROR FAIL → cannot be combined with REPORT STATUS AS.

CONCURRENT TRANSACTIONS (parallel batches)

CYPHER 25
LOAD CSV WITH HEADERS FROM 'file:///large.csv' AS row
CALL (row) {
  MERGE (p:Person {id: row.id}) SET p.name = row.name
} IN 4 CONCURRENT TRANSACTIONS OF 5000 ROWS
  ON ERROR RETRY FOR 30 SECONDS THEN CONTINUE
  REPORT STATUS AS s

Use CONCURRENT for read-heavy MERGE on non-overlapping key spaces. Risk: deadlocks on overlapping writes → combine with ON ERROR RETRY.

REPORT STATUS columns

ColumnTypeMeaning
s.startedBOOLEANBatch transaction started
s.committedBOOLEANBatch committed successfully
s.transactionIdSTRINGTransaction ID
s.errorMessageSTRING or nullError detail if batch failed

Batch size guidance

Row countRecommended batch sizeNotes
< 100k10 000Default is fine
100k – 1M10 000 – 50 000Monitor heap; increase if fast
1M – 10M50 000 – 100 000Enable CONCURRENT if CPUs available
> 10M online50 000Consider neo4j-admin import instead
Relationship import5 000Lower — each batch does 2x MATCH

neo4j-admin import (Offline Bulk Load)

Fastest method: ~3 min for 31M nodes / 78M rels on SSD. DB must be stopped or non-existent.

Command structure

neo4j-admin database import full \
  --nodes=Person="persons_header.csv,persons.csv" \
  --nodes=Movie="movies_header.csv,movies.csv" \
  --relationships=ACTED_IN="acted_in_header.csv,acted_in.csv" \
  --relationships=DIRECTED="directed_header.csv,directed.csv" \
  --delimiter=, \
  --id-type=STRING \
  --bad-tolerance=0 \
  --threads=$(nproc) \
  --high-parallel-io=on \
  neo4j

For SSDs: always set --high-parallel-io=on. For large graphs (>34B nodes/rels): --format=block.

Dry run (2026.02+) — validate without writing:

neo4j-admin database import full --dry-run ...

Node header file format

# persons_header.csv
personId:ID,name,born:int,score:float,active:boolean,:LABEL
# persons.csv (data file — no header row)
p001,Alice,1985,9.2,true,Person
p002,Bob,1990,7.1,false,Person
FieldMeaning
:IDUnique ID for relationship wiring (not stored as property by default)
:ID(Group)Scoped ID space — use when node types share IDs
:LABELOne or more labels; semicolon-separated: Person;Employee
prop:intTyped property; types: int long float double boolean byte short string
prop:dateTemporal: date localtime time localdatetime datetime duration
prop:int[]Array — semicolon-separated values in cell: 1;2;3
prop:vectorFloat vector (2025.10+) — semicolon-separated coordinates

Relationship header file format

# acted_in_header.csv
:START_ID(Person),:END_ID(Movie),role,:TYPE
# acted_in.csv
p001,tt0133093,Neo,ACTED_IN
p002,tt0133093,Morpheus,ACTED_IN

:START_ID / :END_ID must reference the same :ID group as the node files.

Key flags

FlagDefaultNotes
--delimiter,Single char or TAB
--id-typeSTRINGSTRING | INTEGER | ACTUAL
--bad-tolerance-1 (unlimited, changed 2025.12)Set 0 for strict prod imports
--threadsCPU countSet explicitly on shared hosts
--max-off-heap-memory90% RAMReduce if other services share host
--high-parallel-iooffSet on for SSD/NVMe
--formatstandardblock for >34B nodes/rels
--overwrite-destinationfalseRequired if DB already exists
--dry-runfalse2026.02+ — validate without writing

Schema file (--schema) [Enterprise, block format]

Pass a Cypher file with CREATE CONSTRAINT / CREATE INDEX statements; executed automatically after import completes. Constraints are created first (correct order enforced). File paths can be local or remote (s3://, gs://, https://).

neo4j-admin database import full \
  --format=block \
  --schema=schema.cypher \
  --nodes=Person="persons_header.csv,persons.csv" \
  neo4j
// schema.cypher
CREATE CONSTRAINT person_id IF NOT EXISTS FOR (n:Person) REQUIRE n.id IS UNIQUE;
CREATE CONSTRAINT movie_id  IF NOT EXISTS FOR (n:Movie)  REQUIRE n.id IS UNIQUE;
CREATE RANGE INDEX person_email IF NOT EXISTS FOR (n:Person) ON (n.email);
CREATE TEXT  INDEX movie_title  IF NOT EXISTS FOR (n:Movie)  ON (n.title);

For incremental import, DROP CONSTRAINT / DROP INDEX are also supported [2025.02+] — used to remove indexes before the merge phase and recreate them after for faster writes.


Incremental import (Enterprise only)

Three-phase process — use when DB must stay online during import preparation:

# Phase 1: Prepare staging area
neo4j-admin database import incremental --stage=prepare \
  --nodes=Person=persons_header.csv,delta.csv --force neo4j

# Phase 2: Build indexes (DB can be read-only during this phase)
neo4j-admin database import incremental --stage=build neo4j

# Phase 3: Merge into live database (brief write-lock)
neo4j-admin database import incremental --stage=merge neo4j

Requires Enterprise Edition + block store format.


APOC Patterns (when APOC is available)

Verify first: RETURN apoc.version() — if fails, use LOAD CSV or driver instead.

apoc.periodic.iterate — batch-process existing graph data

CALL apoc.periodic.iterate(
  "MATCH (p:Person) WHERE NOT (p)-[:HAS_ACCOUNT]->() RETURN p",
  "CREATE (p)-[:HAS_ACCOUNT]->(a:Account {id: randomUUID()})",
  {batchSize: 10000, parallel: false, retries: 2}
) YIELD batches, total, errorMessages
RETURN batches, total, errorMessages
Config keyDefaultNotes
batchSize10000Rows per inner transaction
parallelfalseEnable for non-overlapping writes; risk: deadlocks
retries0Retry failed batches N times with 100ms delay

Prefer CALL IN TRANSACTIONS (native Cypher) over apoc.periodic.iterate for new code — it has REPORT STATUS, CONCURRENT, and RETRY built in without APOC dependency.

apoc.load.csv — load with config options

CALL apoc.load.csv('file:///persons.csv', {
  header: true,
  sep: ',',
  skip: 1,
  limit: 1000000
}) YIELD lineNo, map, list
CALL (map) {
  MERGE (p:Person {id: map.id}) SET p.name = map.name
} IN TRANSACTIONS OF 10000 ROWS ON ERROR CONTINUE

apoc.load.json — load JSON from file or URL

CALL apoc.load.json('https://api.example.com/persons') YIELD value
CALL (value) {
  MERGE (p:Person {id: value.id}) SET p.name = value.name
} IN TRANSACTIONS OF 1000 ROWS ON ERROR CONTINUE

Driver Batch Write Pattern

Use when source is not a file (API responses, DB migrations). Collect into BATCH_SIZE (10 000) lists, call UNWIND $rows AS row MERGE ... per batch. ~10x faster than row-at-a-time. → Python + JS examples


MCP Tool Usage

OperationMCP toolNotes
SHOW CONSTRAINTS, SHOW INDEXESread-cypherAlways inspect before import
CREATE CONSTRAINT, CREATE INDEXwrite-cypherGate: show planned constraint, confirm
LOAD CSV / CALL IN TRANSACTIONSwrite-cypherGate: show row count + Cypher, confirm
Verify countsread-cypherPost-import: MATCH (n:Label) RETURN count(n)
Poll index stateread-cypherPoll until all state = 'ONLINE'

Write gate — before any bulk write via MCP, show:

  1. Query + affected labels
  2. Estimated row count from source
  3. EXPLAIN plan

Wait for user confirmation. Never auto-execute CALL IN TRANSACTIONS or CREATE CONSTRAINT without confirmation.

Always pass database param if not default: {"code": "...", "database": "neo4j"}.


Common Errors

ErrorCauseFix
Couldn't load the external resourcefile:/// path not in Neo4j import dirMove file to $NEO4J_HOME/import/; check dbms.security.allow_csv_import_from_file_urls=true
Cannot merge node using null property valueMERGE key resolved to nullValidate row.id IS NOT NULL before MERGE; add WHERE row.id IS NOT NULL
toInteger() called on nullNull column fed to non-null-safe fnReplace toInteger()toIntegerOrNull(), toFloat()toFloatOrNull()
Node N already exists / constraint violation mid-importDuplicate source IDsDedup source CSV; use MERGE not CREATE; add IF NOT EXISTS to constraint
Heap overflow / OutOfMemoryErrorBatch too large or file too largeReduce batch size; switch to CALL IN TRANSACTIONS; neo4j-admin for offline
Invalid input 'IN': expected...'PERIODIC COMMIT usedReplace USING PERIODIC COMMITCALL IN TRANSACTIONS — PERIODIC COMMIT removed in Cypher 25
neo4j-admin: Bad input dataWrong header format or type mismatchCheck :ID, :START_ID, :END_ID present; check typed columns parse correctly
neo4j-admin: import fails silently--bad-tolerance default was unlimited pre-2025.12Set --bad-tolerance=0 to surface all errors
Index not used during MERGEConstraint not created before importDrop data, create constraint, re-import
Relationship import missing nodesRelationships imported before nodesAlways import ALL node files before ANY relationship files

Post-Import Validation

After import completes — run all:

// Row counts per label
MATCH (n:Person) RETURN count(n) AS persons;
MATCH ()-[:KNOWS]->() RETURN count(*) AS knows_rels;

// After import: create additional non-unique indexes (populated async)
CREATE TEXT INDEX movie_title IF NOT EXISTS FOR (n:Movie) ON (n.title);
CREATE RANGE INDEX person_born IF NOT EXISTS FOR (n:Person) ON (n.born);

// Poll population — wait until populationPercent = 100 before opening to queries
SHOW INDEXES YIELD name, state, populationPercent
WHERE state <> 'ONLINE' OR populationPercent < 100
RETURN name, state, populationPercent
ORDER BY populationPercent;

// Spot check: null keys = import bug
MATCH (p:Person) WHERE p.id IS NULL RETURN count(p) AS missing_id;

Do NOT run production queries until all indexes are ONLINE.


References


Checklist

  • Uniqueness constraints created before any MERGE-based import
  • APOC availability verified if using apoc.* procedures
  • Target confirmed as PRIMARY (not replica)
  • Source files validated: UTF-8 encoding, expected row count, no BOM
  • LOAD CSV uses toIntegerOrNull() / toFloatOrNull() — never bare toInteger()/toFloat()
  • nullIf(row.x, '') applied where empty string ≠ null
  • CALL IN TRANSACTIONS used (not USING PERIODIC COMMIT)
  • ON ERROR CONTINUE + REPORT STATUS for production loads
  • Node import completed before relationship import
  • neo4j-admin: --bad-tolerance=0 set; --high-parallel-io=on for SSD
  • Post-import: row counts match source; all indexes ONLINE
  • Write execution gate applied (MCP): showed query + estimate, got confirmation
  • Credentials in .env; .env in .gitignore
Repository
neo4j-contrib/neo4j-skills
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.