Skip to content
Myk's Digital Garden
RhizomeDB

icon picker
A New Kind of Database

This is the third essay in a series about databases, relationality and a new approach to managing data. See for part 1 and then for part 2.
In the last essay, we introduced a snapshot of this table with only the first three rows visible. In this essay I’d like to talk about the fourth row included below and what it might look like to construct such a thing.
Data Store Type
Structure
Rhizome
Notes
1
RDBMS
Structured
Integrated
In a traditional SQL database normalized across tables that rhizome is immanent and extant within the schema of the database. You don’t need any external indexes, you don’t need institutional knowledge, you don’t need specifically tailored queries - the rhizome exists within the structure of the database itself.
2
No-SQL
Unstructured
Discarded
A No-SQL database like CouchDB dispenses with the normalization and skips the relationality - as a result, it loses its rhizomatic aspect. Your data is no longer intelligible as a deeply interconnected network expressing itself as seemingly independent things - you’ve gone ahead and modeled it as actually independent things.
3
Data Lake
Partially Structured
External
A data-lake with an engineering team behind it and modern tooling takes the flexibility of a No-SQL store but reconnects the rhizomatic structure by migrating the relationality out into the indexes, queries and institutional knowledge required to work with it effectively.
4
RhizomeDB
Unstructured
Integrated
My goal with Rhizome is to create a functionally unstructured data store with an integrated rhizome. This is achieved by hypernormalizing transparently - so it’s actually fully structured, but in a way that you don’t have to think about.
There are no rows in this table

RhizomeDB

RhizomeDB is a new kind of data store that allows you to store data in a way that’s functionally unstructured while preserving integrated relationality. It does this by locking down a few moving parts.
We don’t store state, actually. At least, not in the way you’re used to thinking about it. The atomic unit of RhizomeDB is a delta.
RhizomeDB is append-only, and once written a delta is immutable.
We have a single fully universal schema (the Delta schema) which is used to model all data in the system. This allows the interface to be what I call “functionally unstructured” - the tool ensures that the fully general schema is applied at write-time automatically.
A delta in this system is technically a
with some specific properties. A traditional RDBMS normalizes data by breaking records down across rows and columns; Rhizome normalizes data by breaking records down into deltas. Rhizome itself is agnostic as to how these deltas are persisted, indexed or otherwise treated which can vary across implementations.

Delta Schema

A Delta has a specific shape. Here’s a typescript interface defining a delta:
type primitive = string | number | boolean

interface RhizomaticDelta {
id: UUID
timestamp: Date
creator: UUID
host: UUID
transaction: UUID
pointers: {
local_context: string
target: UUID | primitive
target_context?: string
}[]
}
The idea is that this delta represents a specific association between one or more things, according to some specific creator, as of some point in time, as captured in some specific system, as a part of some specific transaction.
Let’s look at a concrete example. Let’s say that we are capturing some information from a Movies database as discussed in a prior essay. We will define a delta that changes the universe of our datastore such that as of some timestamp T in our datastore’s history it is the case that Keanu Reeves starred as Neo in The Matrix. That delta might look something like this:
// the ID of this delta, unique to this delta
const id:UUID = "..."

// the timestamp as of which this delta is true for us
const timestamp:Date = Date.now()

// a UUID representing a user inserting this data
const creator:UUID = "..."

// a UUID representing this specific data store
const host:UUID = "..."

// a UUID representing the transaction containing this delta
const transaction:UUID = "..."

// a UUID representing Keanu Reeves
const keanu:UUID = "..."

// a UUID representing the character Neo
const neo:UUID = "..."

// a UUID representing the film The Matrix
const the_matrix:UUID = "..."

const salary:number = 10000000
const currency: string = "usd"

{
id,
timestamp,
creator,
host,
transaction,
pointers: [
{
local_context: "actor",
target: keanu,
target_context: "roles"
}, {
local_context: "role",
target: neo,
target_context: "actor"
},
{
local_context: "film",
target: the_matrix,
target_context: "cast"
},
{
local_context: "base_salary",
target: salary_usd
},
{
local_context: "salary_currency",
target: currency
}
]
}
So you create this delta, wrap it in a transaction with potentially other deltas that you want to either succeed-or-fail together, and then push it to the Rhizome engine. Rhizome then appends this delta to the canonical append-only stream of deltas, and then updates any indexes you’ve got that are paying attention to any combination of the domain entities targeted or contexts referenced.
The prelude - those fields above pointers - feels pretty self-explanatory, right? This is just meta-data on the delta itself, so we can track where it came from. This is useful because deltas can be shared between Rhizome systems - if you grab my movie data and add it to your local system, you still want to know which deltas came from me, etc.
The more tricky part is the pointers array, so let’s break that down a bit.

Understanding Pointers

A delta is a relationship between domain entities and/or primitives that’s true within some system according to some user as of some point in time. Looking at our delta above, we can see that we are asserting a specific relationship between a specific actor, a specific role, a specific film and a specific salary. But our domain entities - Keanu, Neo, The Matrix - are all pass-by-reference using UUIDs. The delta itself doesn’t contain any information about them except what it’s asserting.
You might thing this means that we have to have some separate pass somewhere where we create these entities, right? Like, I need to define a “Keanu Reeves” entity before I can reference it, right?
Well... no, actually. All I need to do is make sure that I’m using a consist UUID for every domain model. Maybe this is the only delta in my local database that refers to Keanu Reeves. If that’s the case, nowhere is his name or gender or birthday specified - that information just isn’t in the system yet. But, I can add additional deltas that point to the same UUID, target the string “Keanu Reeves” or the string “male” or some timestamp etc and specify contexts to articulate why those primitives are being targeted.
This means that “Keanu Reeves”, in this hypothetical system, does not exist except as the collection of deltas that reference the same UUID.
The fields on a pointer are sort of like components of an RDF schema, right:
local_context says “what is the target from the perspective of this delta?”
target is a reference to some domain entity or primitive value
target_context, which is optional, says “what is this delta defining from the perspective of this target?”
This lets me then do things at query time like “Grab all deltas that point to the keanu UUID in any way and integrate them into a single object.” If our delta above is included, then that object would look in part like this:
{
...
roles: [
// we can losslessly embed this entire delta inside of this view of the `keanu` object, and just leave out the pointer to keanu because it's implicit in this view
{
id,
...,
pointers: [
{ film: the_matrix }, // note film and role values are UUIDs here
{ role: neo },
{ salary: 1000000 },
{ salary_currency: "usd" }
]
}
]
}
So our lossless Keanu object includes a set of all deltas that reference keanu, organized based on target_context if provided.
If target_context was left off - as it was for salary and salary_currency - then the delta can be left out of such integrated views when the user looks up 10000000 or ”usd” as values in our system.
You can see how we can actually then further break this down - a lossless view that embeds all deltas that reference Keanu is large and complex and hard to read. So why not have some basic rules that allow us to flatten any given lossless view like the above into something like this:
{
...,
roles: [
{ film: "the matrix", role: "neo", salary: 10000000, salary_currency: "usd" },
...
]
}
This is what I mean when I say that we treat state as a derived side-effect. Our final usable view of Keanu first generates a lossless integral of deltas then flattens those down into a lossy snapshot. What’s cool is that different users can all make assertions about the nature of Keanu and our data store can actually hold all of those assertions at once - even if they contradict each other!
We can then have rules in our query engine or in our flattening flow that resolve conflicts along some contexts, throw errors at conflicts along others or simply return interpolated representations along yet others.
The point is that “keanu” now exists as the collection of all of the various references to him in the scope of our system. In fact, you can think of Keanu as a sort of “node” whose properties are all dynamically assigned via these sort of high-dimensional “edges”....

Wait... this is a Graph!

A , technically. Our deltas are isomorphic to hypernodes in a hypergraph. “State” in RhizomeDB is always an integral of all hypernodes selected by a query then optionally flattened down into a lossy representation of the nodes they reference.
In this way we’re maintaining the relationality of the domain data by creating a single unified graph that represents the rhizome itself. We don’t need to break our data apart into separate tables as long as we maintain a unified delta schema and then build indexes that work on pointer fields annotated by prelude metadata.
But this is different from traditional graph databases (a subtype of No-SQL database, usually) because we’re not merely tracking nodes and edges. Because we’re using a hypergraph that points to domain entities via reference we’re actually fully compliant with relational constraints, and the whole thing collapses down into a stream of CRDTs.

More to Come

This has gotten very long and I’m very tired, haha, so I will write more later. At a high level, here are some things yet to discuss:
Fully relational at query-time
Highly scalable, designed from the ground-up for distributed architectures
Fork/Merge data at the atomic level by diffing and sharing deltas between users or stores
Avoid “last write wins” by always having a full history to draw from - conflicts move to application layer
Every query into the rhizome returns a smaller rhizome
Use streaming to get eventual consistency between large analytical “data lake” rhizomes and small optimized “operational” rhizomes.
Delta triggers - pipe streams of deltas into reactive functions which create more deltas.
Fuzzy Graph Queries - a separate project of mine that I still need to document looks at using Vector Stores to index the names of both nodes and edges in graph databases. This allows for “fuzzy” matching at both write time (”use this existing string instead of the one you’re proposing” gives convergent schemas) and read time (”find and group all nodes connected via edges with names within X similarity of term Y”). Applying these principles to both the local_context and target_context fields on the pointers means we can tune how “literal” we want our queries to be.
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.