- Neo4j Graph Data Science Client

GDS Sessions for Self-Managed Neo4j DB

This Jupyter notebook is hosted here in the Neo4j Graph Data Science Client Github repository.

The notebook shows how to use the graphdatascience Python library to create, manage, and use a GDS Session.

We consider a graph of people and fruits, which we’re using as a simple example to show how to connect your self-managed Neo4j database to a GDS Session, run algorithms, and eventually write back your analytical results to your Neo4j database. We will cover all management operations: creation, listing, and deletion.

If you are using AuraDB, follow this example.

1. Prerequisites

This notebook requires having a Neo4j instance instance available and that the GDS sessions feature is enabled for your Neo4j Aura tenant. Contact your account manager to get the features enabled.

We also need to have the graphdatascience Python library installed, version 1.12a1 or later.

from datetime import timedelta

%pip install "graphdatascience>=1.12a1"

2. Aura API credentials

A GDS Session is managed via the Aura API. In order to use the Aura API, we need to have Aura API credentialsn.

Using these credentials, we can create our GdsSessions object, which is the main entry point for managing GDS Sessions.

import os

from graphdatascience.session import AuraAPICredentials, GdsSessions

client_id = os.environ["AURA_API_CLIENT_ID"]
client_secret = os.environ["AURA_API_CLIENT_SECRET"]

# If your account is a member of several tenants, you must also specify the tenant ID to use
tenant_id = os.environ.get("AURA_API_TENANT_ID", None)

sessions = GdsSessions(api_credentials=AuraAPICredentials(client_id, client_secret, tenant_id=tenant_id))

3. Creating a new session

A new session is created by calling sessions.get_or_create(). As the data source, we assume that a self-managed Neo4j DBMS instance has been set up and is accessible. We need to pass the database address, user name and password to the DbmsConnectionInfo class.

We also need to specify the session size. Please refer to the API reference documentation or the manual for a full list.

Finally, we need to give our session a name. We will call ours people-and-fruits-sm'. It is possible to reconnect to an existing session by calling`get_or_create with the same session name and configuration.

We will also set a time-to-live (TTL) for the session. This ensures that our session is automatically deleted after being unused for 5 hours. This is a good practice to avoid incurring costs should we forget to delete the session ourselves.

import os

from graphdatascience.session import AlgorithmCategory, CloudLocation, DbmsConnectionInfo

# Identify the Neo4j DBMS
db_connection = DbmsConnectionInfo(
    uri=os.environ["NEO4J_URI"], username=os.environ["NEO4J_USER"], password=os.environ["NEO4J_PASSWORD"]
)
# Specify where to create the GDS session
cloud_location = CloudLocation(provider="gcp", region="europe-west1")

# Create a GDS session!
memory = sessions.estimate(
    node_count=20,
    relationship_count=50,
    algorithm_categories=[AlgorithmCategory.CENTRALITY, AlgorithmCategory.NODE_EMBEDDING],
)
gds = sessions.get_or_create(
    # we give it a representative name
    session_name="people-and-fruits-sm",
    memory=memory,
    db_connection=db_connection,
    ttl=timedelta(hours=5),
    cloud_location=cloud_location,
)

4. Listing sessions

Now that we have created a session, let’s list all our sessions to see what that looks like

gds_sessions = sessions.list()

5. Adding a dataset

We assume that the configured Neo4j database instance is empty. We will add our dataset using standard Cypher.

In a more realistic scenario, this step is already done, and we would just connect to the existing database.

data_query = """
  CREATE
    (dan:Person {name: 'Dan',     age: 18, experience: 63, hipster: 0}),
    (annie:Person {name: 'Annie', age: 12, experience: 5, hipster: 0}),
    (matt:Person {name: 'Matt',   age: 22, experience: 42, hipster: 0}),
    (jeff:Person {name: 'Jeff',   age: 51, experience: 12, hipster: 0}),
    (brie:Person {name: 'Brie',   age: 31, experience: 6, hipster: 0}),
    (elsa:Person {name: 'Elsa',   age: 65, experience: 23, hipster: 1}),
    (john:Person {name: 'John',   age: 4, experience: 100, hipster: 0}),

    (apple:Fruit {name: 'Apple',   tropical: 0, sourness: 0.3, sweetness: 0.6}),
    (banana:Fruit {name: 'Banana', tropical: 1, sourness: 0.1, sweetness: 0.9}),
    (mango:Fruit {name: 'Mango',   tropical: 1, sourness: 0.3, sweetness: 1.0}),
    (plum:Fruit {name: 'Plum',     tropical: 0, sourness: 0.5, sweetness: 0.8})

  CREATE
    (dan)-[:LIKES]->(apple),
    (annie)-[:LIKES]->(banana),
    (matt)-[:LIKES]->(mango),
    (jeff)-[:LIKES]->(mango),
    (brie)-[:LIKES]->(banana),
    (elsa)-[:LIKES]->(plum),
    (john)-[:LIKES]->(plum),

    (dan)-[:KNOWS]->(annie),
    (dan)-[:KNOWS]->(matt),
    (annie)-[:KNOWS]->(matt),
    (annie)-[:KNOWS]->(jeff),
    (annie)-[:KNOWS]->(brie),
    (matt)-[:KNOWS]->(brie),
    (brie)-[:KNOWS]->(elsa),
    (brie)-[:KNOWS]->(jeff),
    (john)-[:KNOWS]->(jeff);
"""

# making sure the database is actually empty
assert gds.run_cypher("MATCH (n) RETURN count(n)").squeeze() == 0, "Database is not empty!"

# let's now write our graph!
gds.run_cypher(data_query)

gds.run_cypher("MATCH (n) RETURN count(n) AS nodeCount")

6. Projecting Graphs

Now that we have imported a graph to our database, we can project it into our GDS Session. We do that by using the gds.graph.project() endpoint.

The remote projection query that we are using selects all Person nodes and their LIKES relationships, and all Fruit nodes and their LIKES relationships. Additionally, we project node properties for illustrative purposes. We can use these node properties as input to algorithms, although we do not do that in this notebook.

G, result = gds.graph.project(
    "people-and-fruits",
    """
    CALL {
        MATCH (p1:Person)
        OPTIONAL MATCH (p1)-[r:KNOWS]->(p2:Person)
        RETURN
          p1 AS source, r AS rel, p2 AS target,
          p1 {.age, .experience, .hipster } AS sourceNodeProperties,
          p2 {.age, .experience, .hipster } AS targetNodeProperties
        UNION
        MATCH (f:Fruit)
        OPTIONAL MATCH (f)<-[r:LIKES]-(p:Person)
        RETURN
          p AS source, r AS rel, f AS target,
          p {.age, .experience, .hipster } AS sourceNodeProperties,
          f { .tropical, .sourness, .sweetness } AS targetNodeProperties
    }
    RETURN gds.graph.project.remote(source, target, {
      sourceNodeProperties: sourceNodeProperties,
      targetNodeProperties: targetNodeProperties,
      sourceNodeLabels: labels(source),
      targetNodeLabels: labels(target),
      relationshipType: type(rel)
    })
    """,
)

str(G)

7. Running Algorithms

We can now run algorithms on the projected graph. This is done using the standard GDS Python Client API. There are many other tutorials covering some interesting things we can do at this step, so we will keep it rather brief here.

We will simply run PageRank and FastRP on the graph.

print("Running PageRank ...")
pr_result = gds.pageRank.mutate(G, mutateProperty="pagerank")
print(f"Compute millis: {pr_result['computeMillis']}")
print(f"Node properties written: {pr_result['nodePropertiesWritten']}")
print(f"Centrality distribution: {pr_result['centralityDistribution']}")

print("Running FastRP ...")
frp_result = gds.fastRP.mutate(
    G,
    mutateProperty="fastRP",
    embeddingDimension=8,
    featureProperties=["pagerank"],
    propertyRatio=0.2,
    nodeSelfInfluence=0.2,
)
print(f"Compute millis: {frp_result['computeMillis']}")
# stream back the results
gds.graph.nodeProperties.stream(G, ["pagerank", "fastRP"], separate_property_columns=True, db_node_properties=["name"])

8. Writing back to Neo4j

The GDS Session’s in-memory graph was projected from data in our specified Neo4j database. Write back operations will thus persist the data back to the same Neo4j database. Let’s write back the results of the PageRank and FastRP algorithms to the Neo4j database.

# if this fails once with some error like "unable to retrieve routing table"
# then run it again. this is a transient error with a stale server cache.
gds.graph.nodeProperties.write(G, ["pagerank", "fastRP"])

Of course, we can just use .write modes as well. Let’s run Louvain in write mode to show:

gds.louvain.write(G, writeProperty="louvain")

We can now use the gds.run_cypher() method to query the updated graph. Note that the run_cypher() method will run the query on the Neo4j database.

gds.run_cypher(
    """
    MATCH (p:Person)
    RETURN p.name, p.pagerank AS rank, p.louvain
     ORDER BY rank DESC
    """
)

9. Deleting the session

Now that we have finished our analysis, we can delete the session. The results that we produced were written back to our Neo4j database, and will not be lost. If we computed additional things that we did not write back, those will be lost.

Deleting the session will release all resources associated with it, and stop incurring costs.

gds.delete()

# or sessions.delete(session_name="people-and-fruits")

# let's also make sure the deleted session is truly gone:
sessions.list()

# Lastly, let's clean up the database
gds.run_cypher("MATCH (n:Person|Fruit) DETACH DELETE n")

10. Conclusion

And we’re done! We have created a GDS Session, projected a graph, run some algorithms, written back the results, and deleted the session. This is a simple example, but it shows the main steps of using GDS Sessions.