Kuzu
Kรนzu is an in-process property graph database management system.
This notebook shows how to use LLMs to provide a natural language interface to Kรนzu database with
Cypher
graph query language.Cypher is a declarative graph query language that allows for expressive and efficient data querying in a property graph.
Setting upโ
Install the python package:
pip install kuzu
Create a database on the local machine and connect to it:
import kuzu
db = kuzu.Database("test_db")
conn = kuzu.Connection(db)
First, we create the schema for a simple movie database:
conn.execute("CREATE NODE TABLE Movie (name STRING, PRIMARY KEY(name))")
conn.execute(
"CREATE NODE TABLE Person (name STRING, birthDate STRING, PRIMARY KEY(name))"
)
conn.execute("CREATE REL TABLE ActedIn (FROM Person TO Movie)")
<kuzu.query_result.QueryResult at 0x1066ff410>
Then we can insert some data.
conn.execute("CREATE (:Person {name: 'Al Pacino', birthDate: '1940-04-25'})")
conn.execute("CREATE (:Person {name: 'Robert De Niro', birthDate: '1943-08-17'})")
conn.execute("CREATE (:Movie {name: 'The Godfather'})")
conn.execute("CREATE (:Movie {name: 'The Godfather: Part II'})")
conn.execute(
"CREATE (:Movie {name: 'The Godfather Coda: The Death of Michael Corleone'})"
)
conn.execute(
"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather' CREATE (p)-[:ActedIn]->(m)"
)
conn.execute(
"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)"
)
conn.execute(
"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather Coda: The Death of Michael Corleone' CREATE (p)-[:ActedIn]->(m)"
)
conn.execute(
"MATCH (p:Person), (m:Movie) WHERE p.name = 'Robert De Niro' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)"
)
<kuzu.query_result.QueryResult at 0x107016210>
Creating KuzuQAChain
โ
We can now create the KuzuGraph
and KuzuQAChain
. To create the KuzuGraph
we simply need to pass the database object to the KuzuGraph
constructor.
from langchain.chains import KuzuQAChain
from langchain_community.graphs import KuzuGraph
from langchain_openai import ChatOpenAI
graph = KuzuGraph(db)
chain = KuzuQAChain.from_llm(ChatOpenAI(temperature=0), graph=graph, verbose=True)
Refresh graph schema informationโ
If the schema of database changes, you can refresh the schema information needed to generate Cypher statements.
# graph.refresh_schema()
print(graph.get_schema)
Node properties: [{'properties': [('name', 'STRING')], 'label': 'Movie'}, {'properties': [('name', 'STRING'), ('birthDate', 'STRING')], 'label': 'Person'}]
Relationships properties: [{'properties': [], 'label': 'ActedIn'}]
Relationships: ['(:Person)-[:ActedIn]->(:Movie)']
Querying the graphโ
We can now use the KuzuQAChain
to ask question of the graph
chain.run("Who played in The Godfather: Part II?")
[1m> Entering new chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[:ActedIn]->(m:Movie {name: 'The Godfather: Part II'}) RETURN p.name[0m
Full Context:
[32;1m[1;3m[{'p.name': 'Al Pacino'}, {'p.name': 'Robert De Niro'}][0m
[1m> Finished chain.[0m
'Al Pacino and Robert De Niro both played in The Godfather: Part II.'
chain.run("Robert De Niro played in which movies?")
[1m> Entering new chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person {name: 'Robert De Niro'})-[:ActedIn]->(m:Movie)
RETURN m.name[0m
Full Context:
[32;1m[1;3m[{'m.name': 'The Godfather: Part II'}][0m
[1m> Finished chain.[0m
'Robert De Niro played in The Godfather: Part II.'
chain.run("Robert De Niro is born in which year?")
[1m> Entering new chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person {name: 'Robert De Niro'})-[:ActedIn]->(m:Movie)
RETURN p.birthDate[0m
Full Context:
[32;1m[1;3m[{'p.birthDate': '1943-08-17'}][0m
[1m> Finished chain.[0m
'Robert De Niro was born on August 17, 1943.'
chain.run("Who is the oldest actor who played in The Godfather: Part II?")
[1m> Entering new chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[:ActedIn]->(m:Movie{name:'The Godfather: Part II'})
WITH p, m, p.birthDate AS birthDate
ORDER BY birthDate ASC
LIMIT 1
RETURN p.name[0m
Full Context:
[32;1m[1;3m[{'p.name': 'Al Pacino'}][0m
[1m> Finished chain.[0m
'The oldest actor who played in The Godfather: Part II is Al Pacino.'