CtrlK
BlogDocsLog inGet started
Tessl Logo

neo4j-gds-skill

Neo4j Graph Data Science (GDS) embedded plugin via Python client or Cypher — covers GraphDataScience, gds.v2 plugin endpoints, gds.version, native projection, Cypher projection, graph catalog operations, stream/stats/mutate/write modes, memory estimation, PageRank, Louvain, WCC, FastRP, KNN, Node Similarity, ML pipelines, and cleanup. Use for Aura Pro, self-managed, local, or offline Neo4j DBMS with the GDS plugin installed. Does NOT cover Aura Graph Analytics GDS Sessions, AuraGraphDataScience, GdsSessions, gds.graph.project.remote, or AuraDB Cypher API projection/session management — use neo4j-aura-graph-analytics-skill. Does NOT handle Cypher authoring — use neo4j-cypher-skill. Does NOT cover driver setup — use neo4j-driver-python-skill or other driver skill.

72

Quality

88%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

When to Use

  • Running GDS algorithms against embedded GDS plugin through Python client (graphdatascience)
  • Running GDS algorithms through CALL gds.* Cypher procedures
  • Aura Pro, self-managed Neo4j, local Neo4j, or offline DBMS with GDS plugin installed
  • Projecting named in-memory graphs, running centrality/community/similarity/path/embedding algorithms
  • Chaining algorithms via mutate mode; building FastRP → KNN pipelines
  • Writing node embeddings for Neo4j vector indexes / structural similarity search
  • Memory estimation before large graph operations

When NOT to Use

  • Aura Graph Analytics Sessions / AGA / GdsSessions / AuraGraphDataScienceneo4j-aura-graph-analytics-skill
  • AuraDB Cypher API with { memory: ... } or { sessionId: ... }neo4j-aura-graph-analytics-skill
  • Cypher query authoringneo4j-cypher-skill
  • Driver/connection setupneo4j-driver-python-skill
  • GraphRAG retrievalneo4j-graphrag-skill
  • Creating/querying vector indexes over written embeddingsneo4j-vector-index-skill
ContextUse
Aura Pro with GDS pluginThis skill
Self-managed/local/offline Neo4j with GDS pluginThis skill
AuraDB serverless analytics sessionneo4j-aura-graph-analytics-skill
Self-managed Neo4j attached to AGA sessionneo4j-aura-graph-analytics-skill
Non-Neo4j data sourceneo4j-aura-graph-analytics-skill

Pre-flight

Use only with embedded GDS plugin.

from graphdatascience import GraphDataScience

gds = GraphDataScience("neo4j+s://xxx.databases.neo4j.io", auth=("neo4j", "pw"), aura_ds=True)
gds = GraphDataScience("bolt://localhost:7687", auth=("neo4j", "password"))
print(gds.server_version())
RETURN gds.version() AS gds_version

If Unknown function 'gds.version' → GDS plugin unavailable. AuraDB serverless analytics → neo4j-aura-graph-analytics-skill. Self-managed/local → install or enable GDS plugin.

pip install graphdatascience              # Python client
pip install graphdatascience[rust_ext]    # 3–10× faster serialization

Compatibility: graphdatascience v1.22 — GDS >= 2.6 and < 2.28 / < 2026.6, Python >= 3.10 and < 3.15, Neo4j Driver >= 4.4.12 and < 7.0.

V2 rules:

  • Prefer gds.v2.* when endpoint exists.
  • Use snake_case endpoints and parameters: page_rank, fast_rp, mutate_property, write_property.
  • Use typed result attributes: result.write_millis, not result["writeMillis"].
  • Use v1 if v2 endpoint missing/incompatible; label fallback.

Graph Catalog Operations

Native Projection

CALL gds.graph.project(
  'myGraph',
  ['Person', 'City'],
  { KNOWS: { orientation: 'UNDIRECTED' }, LIVES_IN: {} }
)
YIELD graphName, nodeCount, relationshipCount
G, result = gds.v2.graph.project("myGraph", "Person", "KNOWS")
print(result.node_count, result.relationship_count)

G, result = gds.v2.graph.project(
    "myGraph",
    {"Person": {"properties": ["age", "score"]}, "City": {}},
    {"KNOWS": {"orientation": "UNDIRECTED"}, "LIVES_IN": {"properties": ["since"]}}
)

Native projection: plugin/simple Python-client workflow only. AGA Sessions → neo4j-aura-graph-analytics-skill. V1 fallback: gds.graph.project(...).

Cypher Projection (use for new Cypher workflows, filters, transforms)

G, result = gds.graph.cypher.project(
    """
    MATCH (source:Person)-[r:KNOWS]->(target:Person)
    WHERE source.active = true
    RETURN gds.graph.project($graph_name, source, target,
        { sourceNodeProperties: source { .score }, relationshipType: 'KNOWS' })
    """,
    database="neo4j", graph_name="activeGraph"
)

gds.graph.cypher.project must end with one RETURN gds.graph.project(...) clause. If validation fails: use gds.run_cypher(...), then gds.graph.get("graphName"). Use v1 gds.graph.cypher.project(...) if v2 graph projection cannot express required filter/transform.

AGA Sessions → neo4j-aura-graph-analytics-skill; never use plugin Cypher projection.

Undirected Projection

Native projection: set orientation: 'UNDIRECTED' per relationship type. Plugin Cypher projection: set undirectedRelationshipTypes: ['*'] in fifth gds.graph.project(...) config argument.

Leiden is defined for directed and undirected graphs. Project undirected relationships when community structure is naturally symmetric.

Inspect and Drop

G.node_count()              # 12_043
G.relationship_count()      # 87_211
G.node_properties()         # projected + mutated properties by label
G.relationship_properties() # projected + mutated properties by type
G.size_in_bytes()
gds.v2.graph.drop(G)        # frees JVM heap

G = gds.v2.graph.get("myGraph")       # re-attach to existing projection

gds.v2.graph.list()

Memory Estimation — run before large projections and algorithms

CALL gds.graph.project.estimate(['Person'], 'KNOWS')
YIELD requiredMemory, bytesMin, bytesMax, nodeCount, relationshipCount
G, project_result = gds.v2.graph.project("myGraph", "Person", "KNOWS")
print(project_result.node_count)

# Algorithm estimation:
est = gds.v2.page_rank.estimate(G, damping_factor=0.85)
print(est.required_memory)

Projection estimate fallback: use v1 gds.graph.project.estimate(...) if v2 estimate endpoint unavailable.


Execution Modes

ModeSide effectReturnsUse when
streamNoneRow per node/pairInspect results; top-N
statsNoneSingle aggregate rowSummary/convergence check
mutateAdds node property or relationship type/property to in-memory graph onlyStats rowChain algorithms
writePersists node property or relationship to Neo4j DBStats rowFinal step — make queryable

Pattern: stream to verify → mutate to chain → write to persist.

mutate_property must not exist in the in-memory graph. Relationship algorithms such as KNN also require mutate_relationship_type. After write, re-project to use written properties in subsequent GDS calls (in-memory graph does not see DB writes).


gds.util.asNode() — Enrich Stream Results

stream mode yields nodeId (internal GDS integer). gds.util.asNode(nodeId) translates it back to the DB node so you can access properties.

// Single property
CALL gds.pageRank.stream('myGraph', {})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC LIMIT 10

// Multiple properties — convert once with WITH
CALL gds.pageRank.stream('myGraph', {})
YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS node, score
RETURN node.name AS name, node.born AS born, score
ORDER BY score DESC LIMIT 10

Not needed for write, mutate, or stats modes — those don't return per-node data.


Core Algorithms

PageRank (centrality)

CALL gds.pageRank.stream('myGraph', { dampingFactor: 0.85, maxIterations: 20 })
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score ORDER BY score DESC LIMIT 10
// score: relative influence — not absolute. Compare within same run only.
// didConverge: true means score stabilized; if false, increase maxIterations.

CALL gds.pageRank.write('myGraph', { writeProperty: 'pagerank', dampingFactor: 0.85 })
YIELD nodePropertiesWritten, ranIterations, didConverge
pr_df = gds.v2.page_rank.stream(G, damping_factor=0.85)
mutate_result = gds.v2.page_rank.mutate(G, mutate_property="pagerank", damping_factor=0.85)
write_result = gds.v2.page_rank.write(G, write_property="pagerank", damping_factor=0.85)
print(write_result.write_millis)

Louvain (community detection)

CALL gds.louvain.stream('myGraph', { relationshipWeightProperty: 'weight' })
YIELD nodeId, communityId

CALL gds.louvain.write('myGraph', { writeProperty: 'community' })
YIELD communityCount, modularity
louvain_df = gds.v2.louvain.stream(G)
write_result = gds.v2.louvain.write(G, write_property="community")
print(write_result.community_count)

Leiden is a refinement of Louvain avoiding poorly connected communities — use when community quality > raw speed. modularity in stats result: range -0.5 to 1.0. [field] Values > 0.3 often indicate meaningful community structure; > 0.7 is strong. Leiden is defined for directed and undirected graphs. Project undirected relationships when community structure is naturally symmetric.

WCC — Weakly Connected Components

Run WCC first to understand graph structure; partition disconnected graphs before expensive algorithms.

CALL gds.wcc.stream('myGraph', { minComponentSize: 10 })
YIELD nodeId, componentId

CALL gds.wcc.write('myGraph', { writeProperty: 'componentId' })
YIELD nodePropertiesWritten, componentCount
wcc_df = gds.v2.wcc.stream(G)
write_result = gds.v2.wcc.write(G, write_property="componentId")
print(write_result.node_properties_written)

Betweenness Centrality

gds.v2.betweenness_centrality.stream(G)          # identifies bottleneck/bridge nodes
gds.v2.betweenness_centrality.write(G, write_property="betweenness")

Node Similarity

Jaccard similarity from common neighbors — no node properties required.

gds.v2.node_similarity.stream(G, similarity_cutoff=0.1, top_k=10)
gds.v2.node_similarity.write(G, write_relationship_type="SIMILAR", write_property="score",
                             similarity_cutoff=0.1, top_k=10)

FastRP (node embeddings)

Fast, scalable, production ML pipelines. Set randomSeed for reproducibility.

CALL gds.fastRP.mutate('myGraph', {
  embeddingDimension: 256,
  iterationWeights: [0.0, 1.0, 1.0],
  featureProperties: ['score'],
  propertyRatio: 0.5,
  normalizationStrength: -0.5,
  randomSeed: 42,
  mutateProperty: 'embedding'
})
YIELD nodePropertiesWritten
gds.v2.fast_rp.mutate(G, embedding_dimension=256, iteration_weights=[0.0, 1.0, 1.0],
                      random_seed=42, mutate_property="embedding")
write_result = gds.v2.fast_rp.write(G, embedding_dimension=256, write_property="embedding",
                                    random_seed=42)
print(write_result.write_millis)

For ANN search over structural embeddings, after write, create a Neo4j vector index over the written property. Use neo4j-vector-index-skill.

KNN — K-Nearest Neighbors

Finds k most similar nodes per node based on node properties (typically embeddings).

CALL gds.knn.stream('myGraph', {
  nodeProperties: ['embedding'], topK: 10,
  sampleRate: 0.5, similarityCutoff: 0.7
})
YIELD node1, node2, similarity

CALL gds.knn.write('myGraph', {
  nodeProperties: ['embedding'], topK: 10,
  writeRelationshipType: 'SIMILAR', writeProperty: 'score'
})
YIELD relationshipsWritten
knn_df = gds.v2.knn.stream(G, node_properties=["embedding"], top_k=10)
gds.v2.knn.write(G, node_properties=["embedding"], top_k=10,
                 write_relationship_type="SIMILAR", write_property="score")

FastRP → KNN Pipeline (recommendation)

# 1. Project
G, _ = gds.v2.graph.project("myGraph", "Product",
    {"BOUGHT_TOGETHER": {"orientation": "UNDIRECTED"}})

# 2. Estimate memory
print(gds.v2.fast_rp.estimate(G, embedding_dimension=128).required_memory)

# 3. Embed
gds.v2.fast_rp.mutate(G, embedding_dimension=128, random_seed=42, mutate_property="emb")

# 4. Similarity
gds.v2.knn.write(G, node_properties=["emb"], top_k=10,
                 write_relationship_type="SIMILAR", write_property="score")

# 5. Cleanup
gds.v2.graph.drop(G)

Algorithm Selection

GoalAlgorithm
Influence via network linksPageRank / ArticleRank
Bottleneck / bridge nodesBetweenness Centrality
Direct connectionsDegree Centrality
Community (general, fast)Louvain
Community (higher quality)Leiden
Is graph connected?WCC (run first)
Similarity from embeddingsKNN
Similarity from neighborsNode Similarity
Shortest path (positive weights)Dijkstra / A*
k alternative pathsYen's
Fast scalable embeddingsFastRP
Feature-rich nodesGraphSAGE (gds.beta.graphSage)

Full algorithm catalog → references/algorithms.md


Common Errors

ErrorCauseFix
Unknown function 'gds.version'Embedded GDS plugin unavailableAGA → neo4j-aura-graph-analytics-skill; self-managed/local → install plugin
Insufficient heap memory / OOMGraph too large for available JVM heapRun gds.graph.project.estimate; increase dbms.memory.heap.max_size
Procedure not found: gds.leidenOlder or incompatible GDSCheck CALL gds.list() for available procedures; upgrade GDS or use Louvain
Node property 'X' not found after mutateProperty not projected or wrong graph nameVerify G.node_properties() includes the property; check mutate_property spelling
Graph 'myGraph' already existsLeftover projection from failed runCALL gds.graph.drop('myGraph') or gds.v2.graph.drop(G)
mutate_property already existsRe-running algorithm on same projectionDrop and re-project, or use different mutate_property name
No algorithm resultsSource/target node not in projectionVerify node labels/rel types match projection; check G.node_count()

Full Workflow

  1. Create gds with GraphDataScience(...).
  2. Verify plugin: gds.server_version() or RETURN gds.version().
  3. Estimate memory: gds.graph.project.estimate(...) and algorithm .estimate(...).
  4. Project named graph with gds.v2.graph.project(...).
  5. Run gds.v2.*.stream first; switch to mutate; use write only when satisfied.
  6. Drop graph with gds.v2.graph.drop(G).
  7. Use v1 only for endpoints missing in v2, such as plugin Cypher projection.

Built-in test datasets: gds.v2.graph.datasets.load_cora(), gds.v2.graph.datasets.load_karate_club(), gds.v2.graph.datasets.load_imdb()


MCP Tool Mapping

OperationMCP tool
RETURN gds.version()read-cypher
gds.pageRank.stream(...)read-cypher
gds.pageRank.write(...)write-cypher
gds.graph.drop(...)write-cypher
List available proceduresread-cypherCALL gds.list()

Before any write-cypher: show exact Cypher, expected nodes/relationships affected, and ask for confirmation. For algorithm write mode, estimate or run stats first when available.


References

  • references/algorithms.md — full algorithm catalog: all procedures, parameters, tiers, Cypher + Python examples
  • references/graph-projection.md — projection deep-dive: filtering, heterogeneous graphs, relationship orientation, property types
  • GDS Manual
  • Python Client Docs

Checklist

  • Embedded GDS plugin confirmed with gds.version() or gds.server_version()
  • Graph/algorithm memory estimated before large work
  • Python examples prefer gds.v2.*, snake_case params, typed result attributes
  • v1 APIs used only as explicit fallback
  • Projection uses native or plugin Cypher projection; no gds.graph.project.remote(...)
  • Named graph dropped after use (gds.v2.graph.drop(G) or v1 fallback)
  • Execution mode chosen: stream (inspect) → mutate (chain) → write (persist)
  • write_property/mutate_property checked for collision with existing properties
  • randomSeed set for reproducible embeddings
  • WCC run first on graphs that may be disconnected
Repository
neo4j-contrib/neo4j-skills
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.