CtrlK
BlogDocsLog inGet started
Tessl Logo

neo4j-gds-skill

Neo4j Graph Data Science (GDS) plugin — graph projection, algorithm execution, execution modes (stream/stats/mutate/write), memory estimation, and the GDS Python client (graphdatascience v1.21). Use when running gds.pageRank, gds.louvain, gds.wcc, gds.fastRP, gds.knn, gds.betweenness, gds.nodeSimilarity, or any gds.* procedure; projecting named in-memory graphs with gds.graph.project or graph.project; chaining algorithms with mutate mode; computing node embeddings for ML; building recommendation systems with FastRP + KNN. Also triggers on GraphDataScience, GdsSessions, graph catalog operations, ML pipelines, node classification, link prediction. Does NOT cover Aura Graph Analytics serverless sessions — use neo4j-aura-graph-analytics-skill. Does NOT handle Cypher authoring — use neo4j-cypher-skill. Does NOT cover driver setup — use neo4j-driver-python-skill or other driver skill.

72

Quality

88%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

When to Use

  • Running GDS algorithms on self-managed Neo4j or Aura Pro (embedded plugin)
  • Projecting named in-memory graphs, running centrality/community/similarity/path/embedding algorithms
  • Chaining algorithms via mutate mode; building FastRP → KNN pipelines
  • Memory estimation before large graph operations
  • GDS Python client (graphdatascience) workflows

When NOT to Use

  • Aura BC / VDC / Free — GDS plugin unavailable → neo4j-aura-graph-analytics-skill
  • Cypher query authoringneo4j-cypher-skill
  • Driver/connection setupneo4j-driver-python-skill
  • GraphRAG retrievalneo4j-graphrag-skill
DeploymentUse
Aura FreeUpgrade to Pro or use neo4j-aura-graph-analytics-skill
Aura ProThis skill
Aura BC / VDCneo4j-aura-graph-analytics-skill
Self-managed (Community or Enterprise)This skill (install GDS plugin)

Pre-flight

RETURN gds.version() AS gds_version

Fails with Unknown function 'gds.version' → GDS not installed or wrong tier. Stop, inform user.

pip install graphdatascience              # Python client
pip install graphdatascience[rust_ext]    # 3–10× faster serialization

Compatibility: graphdatascience v1.21 — GDS >= 2.6, Python >= 3.10, Neo4j Driver >= 4.4.12

from graphdatascience import GraphDataScience

gds = GraphDataScience("bolt://localhost:7687", auth=("neo4j", "password"))
gds = GraphDataScience("neo4j+s://xxx.databases.neo4j.io", auth=("neo4j", "pw"), aura_ds=True)
print(gds.server_version())

Graph Catalog Operations

Native Projection

CALL gds.graph.project(
  'myGraph',
  ['Person', 'City'],
  { KNOWS: { orientation: 'UNDIRECTED' }, LIVES_IN: {} }
)
YIELD graphName, nodeCount, relationshipCount
G, result = gds.graph.project("myGraph", "Person", "KNOWS")

G, result = gds.graph.project(
    "myGraph",
    {"Person": {"properties": ["age", "score"]}, "City": {}},
    {"KNOWS": {"orientation": "UNDIRECTED"}, "LIVES_IN": {"properties": ["since"]}}
)

Cypher Projection (use when native can't express filter/transform)

G, result = gds.graph.cypher.project(
    """
    MATCH (source:Person)-[r:KNOWS]->(target:Person)
    WHERE source.active = true
    RETURN gds.graph.project($graph_name, source, target,
        { sourceNodeProperties: source { .score }, relationshipType: 'KNOWS' })
    """,
    database="neo4j", graph_name="activeGraph"
)

Native projection over Cypher projection whenever possible — 5–10× faster on large graphs.

Weighted Projection (Cypher projection syntax)

MATCH (source:User)-[r:RATED]->(target:Movie)
WITH gds.graph.project(
  'user-movie-weighted',
  source, target,
  { relationshipProperties: r { .rating } },
  { undirectedRelationshipTypes: ['*'] }
) AS g
RETURN g.graphName, g.nodeCount, g.relationshipCount

Relationship Aggregation (collapse parallel relationships into a weighted edge)

MATCH (source:Actor)-[r:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(target:Actor)
WITH source, target, count(r) AS collabCount
WITH gds.graph.project(
  'actor-network',
  source, target,
  { relationshipProperties: { collabCount: collabCount } },
  { undirectedRelationshipTypes: ['*'] }
) AS g
RETURN g.graphName, g.nodeCount, g.relationshipCount

Use count(r) to aggregate multiple parallel relationships into a single weighted edge. Reduces graph size; enables weight-based algorithms.

Undirected Projection (native syntax)

Pass orientation: 'UNDIRECTED' per relationship type — or use undirectedRelationshipTypes: ['*'] in Cypher projection (second config map).

Leiden requires undirected relationships. Community detection and similarity algorithms generally work better on undirected graphs.

Inspect and Drop

G.node_count()            # 12_043
G.relationship_count()    # 87_211
G.node_properties("Person")  # lists projected + mutated properties
G.memory_usage()          # "45 MiB"
G.exists()
G.drop()                  # always drop after use — frees JVM heap

G = gds.graph.get("myGraph")          # re-attach to existing projection

with gds.graph.project("tmp", "Person", "KNOWS")[0] as G:
    results = gds.pageRank.stream(G)
# dropped automatically

Memory Estimation — always run before large projections and algorithms

CALL gds.graph.project.estimate(['Person'], 'KNOWS')
YIELD requiredMemory, bytesMin, bytesMax, nodeCount, relationshipCount
est = gds.graph.project.estimate("Person", "KNOWS")
print(est["requiredMemory"])    # e.g. "1234 MiB"

# Algorithm estimation:
est = gds.pageRank.estimate(G, dampingFactor=0.85)
print(est["requiredMemory"])

Execution Modes

ModeSide effectReturnsUse when
streamNoneRow per node/pairInspect results; top-N
statsNoneSingle aggregate rowSummary/convergence check
mutateAdds property to in-memory graph onlyStats rowChain algorithms
writePersists property to Neo4j DBStats rowFinal step — make queryable

Pattern: stream to verify → mutate to chain → write to persist.

mutateProperty must not already exist in the in-memory graph. After write, re-project to use written properties in subsequent GDS calls (in-memory graph does not see DB writes).


gds.util.asNode() — Enrich Stream Results

stream mode yields nodeId (internal GDS integer). gds.util.asNode(nodeId) translates it back to the DB node so you can access properties.

// Single property
CALL gds.pageRank.stream('myGraph', {})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC LIMIT 10

// Multiple properties — convert once with WITH
CALL gds.pageRank.stream('myGraph', {})
YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS node, score
RETURN node.name AS name, node.born AS born, score
ORDER BY score DESC LIMIT 10

Not needed for write, mutate, or stats modes — those don't return per-node data.


Core Algorithms

PageRank (centrality)

CALL gds.pageRank.stream('myGraph', { dampingFactor: 0.85, maxIterations: 20 })
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score ORDER BY score DESC LIMIT 10
// score: relative influence — not absolute. Compare within same run only.
// didConverge: true means score stabilized; if false, increase maxIterations.

CALL gds.pageRank.write('myGraph', { writeProperty: 'pagerank', dampingFactor: 0.85 })
YIELD nodePropertiesWritten, ranIterations, didConverge
pr_df = gds.pageRank.stream(G, dampingFactor=0.85)
gds.pageRank.mutate(G, mutateProperty="pagerank", dampingFactor=0.85)
gds.pageRank.write(G, writeProperty="pagerank", dampingFactor=0.85)

Louvain (community detection)

CALL gds.louvain.stream('myGraph', { relationshipWeightProperty: 'weight' })
YIELD nodeId, communityId

CALL gds.louvain.write('myGraph', { writeProperty: 'community' })
YIELD communityCount, modularity
louvain_df = gds.louvain.stream(G)
gds.louvain.write(G, writeProperty="community")

Leiden is a refinement of Louvain avoiding poorly connected communities — use when community quality > raw speed. modularity in stats result: range -0.5 to 1.0; values > 0.3 indicate meaningful community structure; > 0.7 = strong. Leiden requires undirected relationships in the projection.

WCC — Weakly Connected Components

Run WCC first to understand graph structure; partition disconnected graphs before expensive algorithms.

CALL gds.wcc.stream('myGraph', { minComponentSize: 10 })
YIELD nodeId, componentId

CALL gds.wcc.write('myGraph', { writeProperty: 'componentId' })
YIELD nodePropertiesWritten, componentCount
wcc_df = gds.wcc.stream(G)
gds.wcc.write(G, writeProperty="componentId")

Betweenness Centrality

gds.betweenness.stream(G)          # identifies bottleneck/bridge nodes
gds.betweenness.write(G, writeProperty="betweenness")

Node Similarity

Jaccard similarity from common neighbors — no node properties required.

gds.nodeSimilarity.stream(G, similarityCutoff=0.1, topK=10)
gds.nodeSimilarity.write(G, writeRelationshipType="SIMILAR", writeProperty="score",
                          similarityCutoff=0.1, topK=10)

FastRP (node embeddings)

Fast, scalable, production ML pipelines. Set randomSeed for reproducibility.

CALL gds.fastRP.mutate('myGraph', {
  embeddingDimension: 256,
  iterationWeights: [0.0, 1.0, 1.0],
  featureProperties: ['score'],
  propertyRatio: 0.5,
  normalizationStrength: -0.5,
  randomSeed: 42,
  mutateProperty: 'embedding'
})
YIELD nodePropertiesWritten
gds.fastRP.mutate(G, embeddingDimension=256, iterationWeights=[0.0, 1.0, 1.0],
                  randomSeed=42, mutateProperty="embedding")
gds.fastRP.write(G, embeddingDimension=256, writeProperty="embedding", randomSeed=42)

KNN — K-Nearest Neighbors

Finds k most similar nodes per node based on node properties (typically embeddings).

CALL gds.knn.stream('myGraph', {
  nodeProperties: ['embedding'], topK: 10,
  sampleRate: 0.5, similarityCutoff: 0.7
})
YIELD node1, node2, similarity

CALL gds.knn.write('myGraph', {
  nodeProperties: ['embedding'], topK: 10,
  writeRelationshipType: 'SIMILAR', writeProperty: 'score'
})
YIELD relationshipsWritten
knn_df = gds.knn.stream(G, nodeProperties=["embedding"], topK=10)
gds.knn.write(G, nodeProperties=["embedding"], topK=10,
              writeRelationshipType="SIMILAR", writeProperty="score")

FastRP → KNN Pipeline (recommendation)

# 1. Project
G, _ = gds.graph.project("myGraph", "Product",
    {"BOUGHT_TOGETHER": {"orientation": "UNDIRECTED"}})

# 2. Estimate memory
print(gds.fastRP.estimate(G, embeddingDimension=128)["requiredMemory"])

# 3. Embed
gds.fastRP.mutate(G, embeddingDimension=128, randomSeed=42, mutateProperty="emb")

# 4. Similarity
gds.knn.write(G, nodeProperties=["emb"], topK=10,
              writeRelationshipType="SIMILAR", writeProperty="score")

# 5. Cleanup — always
G.drop()

Algorithm Selection

GoalAlgorithm
Influence via network linksPageRank / ArticleRank
Bottleneck / bridge nodesBetweenness Centrality
Direct connectionsDegree Centrality
Community (general, fast)Louvain
Community (higher quality)Leiden
Is graph connected?WCC (run first)
Similarity from embeddingsKNN
Similarity from neighborsNode Similarity
Shortest path (positive weights)Dijkstra / A*
k alternative pathsYen's
Fast scalable embeddingsFastRP
Feature-rich nodesGraphSAGE (Beta)

Full algorithm catalog → references/algorithms.md


Common Errors

ErrorCauseFix
Unknown function 'gds.version'GDS not installed / wrong tierInstall plugin; on Aura BC/VDC use neo4j-aura-graph-analytics-skill
Insufficient heap memory / OOMGraph too large for available JVM heapRun gds.graph.project.estimate first; increase dbms.memory.heap.max_size
Procedure not found: gds.leidenAlgorithm not licensed / older GDSCheck CALL gds.list() for available procedures; upgrade GDS or use Louvain
Node property 'X' not found after mutateProperty not projected or wrong graph nameVerify G.node_properties("Label") includes the property; check mutateProperty spelling
Graph 'myGraph' already existsLeftover projection from failed runCALL gds.graph.drop('myGraph') or G.drop()
mutateProperty already existsRe-running algorithm on same projectionDrop and re-project, or use different mutateProperty name
No algorithm resultsSource/target node not in projectionVerify node labels/rel types match projection; check G.node_count()

Full Workflow

# 0. Verify
print(gds.server_version())

# 1. Estimate
est = gds.graph.project.estimate("Person", "KNOWS")
print(est["requiredMemory"])

# 2. Project
G, _ = gds.graph.project("myGraph", "Person",
    {"KNOWS": {"orientation": "UNDIRECTED"}})
print(G.node_count(), G.relationship_count())

# 3. Stream to verify
df = gds.pageRank.stream(G)
print(df.sort_values("score", ascending=False).head(10))

# 4. Write when satisfied
gds.pageRank.write(G, writeProperty="pagerank", dampingFactor=0.85)

# 5. Drop — frees JVM heap
G.drop()

Built-in test datasets: gds.graph.load_cora(), gds.graph.load_karate_club(), gds.graph.load_imdb()


MCP Tool Mapping

OperationMCP tool
RETURN gds.version()read-cypher
gds.pageRank.stream(...)read-cypher
gds.pageRank.write(...)write-cypher
gds.graph.drop(...)write-cypher
List available proceduresread-cypherCALL gds.list()

References

  • references/algorithms.md — full algorithm catalog: all procedures, parameters, tiers, Cypher + Python examples
  • references/graph-projection.md — projection deep-dive: filtering, heterogeneous graphs, relationship orientation, property types
  • GDS Manual
  • Python Client Docs

Checklist

  • gds.version() confirmed — GDS installed and licensed
  • Memory estimated before large projections and expensive algorithms
  • Named graph dropped after use (G.drop() or context manager)
  • Execution mode chosen: stream (inspect) → mutate (chain) → write (persist)
  • writeProperty/mutateProperty checked for collision with existing properties
  • randomSeed set for reproducible embeddings
  • WCC run first on graphs that may be disconnected
  • Native projection used over Cypher projection unless filtering/transformation required
Repository
neo4j-contrib/neo4j-skills
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.