RhizomeDB

A New Kind of Database

This is the third essay in a series about databases, relationality and a new approach to managing data. See

A Short and Mostly Wrong History of Databases⁠

for part 1 and then

Understanding Rhizomes⁠

for part 2.

⁠

In the last essay, we introduced a snapshot of this table with only the first three rows visible. In this essay I’d like to talk about the fourth row included below and what it might look like to construct such a thing.

Data Store Type

Structure

Rhizome

Notes

RDBMS

In a traditional SQL database normalized across tables that rhizome is immanent and extant within the schema of the database. You don’t need any external indexes, you don’t need institutional knowledge, you don’t need specifically tailored queries - the rhizome exists within the structure of the database itself.

No-SQL

A No-SQL database like CouchDB dispenses with the normalization and skips the relationality - as a result, it loses its rhizomatic aspect. Your data is no longer intelligible as a deeply interconnected network expressing itself as seemingly independent things - you’ve gone ahead and modeled it as actually independent things.

Data Lake

A data-lake with an engineering team behind it and modern tooling takes the flexibility of a No-SQL store but reconnects the rhizomatic structure by migrating the relationality out into the indexes, queries and institutional knowledge required to work with it effectively.

RhizomeDB

My goal with Rhizome is to create a functionally unstructured data store with an integrated rhizome. This is achieved by hypernormalizing transparently - so it’s actually fully structured, but in a way that you don’t have to think about.

There are no rows in this table

⁠

RhizomeDB

RhizomeDB is a new kind of data store that allows you to store data in a way that’s functionally unstructured while preserving integrated relationality. It does this by locking down a few moving parts.

We don’t store state, actually. At least, not in the way you’re used to thinking about it. The atomic unit of RhizomeDB is a delta.

RhizomeDB is append-only, and once written a delta is immutable.

We have a single fully universal schema (the Delta schema) which is used to model all data in the system. This allows the interface to be what I call “functionally unstructured” - the tool ensures that the fully general schema is applied at write-time automatically.

A delta in this system is technically a

CRDT⁠

with some specific properties. A traditional RDBMS normalizes data by breaking records down across rows and columns; Rhizome normalizes data by breaking records down into deltas. Rhizome itself is agnostic as to how these deltas are persisted, indexed or otherwise treated which can vary across implementations.

Delta Schema

A Delta has a specific shape. Here’s a typescript interface defining a delta:

type primitive = string | number | boolean

interface RhizomaticDelta {

id: UUID

timestamp: Date

creator: UUID

host: UUID

transaction: UUID

pointers: {

local_context: string

target: UUID | primitive

target_context?: string

}[]

}

The idea is that this delta represents a specific association between one or more things, according to some specific creator, as of some point in time, as captured in some specific system, as a part of some specific transaction.

Let’s look at a concrete example. Let’s say that we are capturing some information from a Movies database as discussed in a prior essay. We will define a delta that changes the universe of our datastore such that as of some timestamp T in our datastore’s history it is the case that Keanu Reeves starred as Neo in The Matrix. That delta might look something like this:

// the ID of this delta, unique to this delta

const id:UUID = "..."

// the timestamp as of which this delta is true for us

const timestamp:Date = Date.now()

// a UUID representing a user inserting this data

const creator:UUID = "..."

// a UUID representing this specific data store

const host:UUID = "..."

// a UUID representing the transaction containing this delta

const transaction:UUID = "..."

// a UUID representing Keanu Reeves

const keanu:UUID = "..."

// a UUID representing the character Neo

const neo:UUID = "..."

// a UUID representing the film The Matrix

const the_matrix:UUID = "..."

const salary:number = 10000000

const currency: string = "usd"

{

id,

timestamp,

creator,

host,

transaction,

pointers: [

{

local_context: "actor",

target: keanu,

target_context: "roles"

}, {

local_context: "role",

target: neo,

target_context: "actor"

{

local_context: "film",

target: the_matrix,

target_context: "cast"

{

local_context: "base_salary",

target: salary_usd

{

local_context: "salary_currency",

target: currency

}

]

}

So you create this delta, wrap it in a transaction with potentially other deltas that you want to either succeed-or-fail together, and then push it to the Rhizome engine. Rhizome then appends this delta to the canonical append-only stream of deltas, and then updates any indexes you’ve got that are paying attention to any combination of the domain entities targeted or contexts referenced.

The prelude - those fields above pointers - feels pretty self-explanatory, right? This is just meta-data on the delta itself, so we can track where it came from. This is useful because deltas can be shared between Rhizome systems - if you grab my movie data and add it to your local system, you still want to know which deltas came from me, etc.

The more tricky part is the pointers array, so let’s break that down a bit.

Understanding Pointers

A delta is a relationship between domain entities and/or primitives that’s true within some system according to some user as of some point in time. Looking at our delta above, we can see that we are asserting a specific relationship between a specific actor, a specific role, a specific film and a specific salary. But our domain entities - Keanu, Neo, The Matrix - are all pass-by-reference using UUIDs. The delta itself doesn’t contain any information about them except what it’s asserting.

You might thing this means that we have to have some separate pass somewhere where we create these entities, right? Like, I need to define a “Keanu Reeves” entity before I can reference it, right?

Well... no, actually. All I need to do is make sure that I’m using a consist UUID for every domain model. Maybe this is the only delta in my local database that refers to Keanu Reeves. If that’s the case, nowhere is his name or gender or birthday specified - that information just isn’t in the system yet. But, I can add additional deltas that point to the same UUID, target the string “Keanu Reeves” or the string “male” or some timestamp etc and specify contexts to articulate why those primitives are being targeted.

This means that “Keanu Reeves”, in this hypothetical system, does not exist except as the collection of deltas that reference the same UUID.

The fields on a pointer are sort of like components of an RDF schema, right:

local_context says “what is the target from the perspective of this delta?”

target is a reference to some domain entity or primitive value

target_context, which is optional, says “what is this delta defining from the perspective of this target?”

This lets me then do things at query time like “Grab all deltas that point to the keanu UUID in any way and integrate them into a single object.” If our delta above is included, then that object would look in part like this:

{

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.