Skip to content
Myk's Digital Garden
RhizomeDB

icon picker
Understanding Rhizomes

This is the second essay in a series about databases, relationality and a new approach to managing data that I’m working on. This essay assumes you’ve either read or are broadly familiar with .
So, okay. We have a sense now of , at least through a specific sort of historical lens, and the thing I want you to think about is this: what is a data lake? Specifically, is it a database? Well, soooort of, but not really, right? It’s just a bunch of stuff - the relationships between the “drops” in the “lake” is unstructured, and needs to be specified somewhere else. Every “drop” in that lake is a one or a zero, and if you want to get anything out of it you need to bring your own glass, if you’ll pardon the metaphor.
In other words, it’s at its core the opposite of relational. In a way, a data lake is a massive No-SQL datastore where JSON documents live next to SQL dumps alongside stack traces. But it becomes relational because the relationships between the different sets of data within it are stored externally, right? It’s almost like, you’ve got this “lake” of “droplets” and someone somewhere has a giant map telling you exactly where in the lake any given droplet is, and maybe how it relates to other droplets.
All of this is sort of decoupled, and these queries are complex and require a data engineering team to figure out how to efficiently work with. But, if you squint and look at the data lake AND the structured queries run against it AND the engineering team - is THAT a database? Well, it sort of starts to look like one, right? But it’s so... abstract?
And this is where we need to understand the philosophical concept of a Rhizome.

Botany, Philosophy and Networks - oh my!

In Botany, a is a plant whose root structure grows horizontally instead of vertically. Such a plant often has more than one “stem” or “shoot”, and to anyone above ground it looks like there are multiple plants growing next to each other. Dig under the surface, though, and you realize that actually it’s just one plant - that the illusion of distinct entities came from the fact that the root structure was invisible.
If you’re poetically minded you may realize that this kind of structure lends itself particularly well to certain kinds of metaphors. We’ve all had experiences where we realize that things we thought were independent are actually aspects of the same underlying thing.
Post-Structuralist philosophers Deleuze and Guattari famously popularized the idea of a “” in philosophy in a way that’s so technical that I’m not going to try to explain it here exactly, but will just say that it’s similarly a structure that manifests as seemingly unrelated things until you look at it from a certain point of view and see that every part of it is actually defined in terms of all of the rest of it. To them, a rhizome is a network where any point is connected to any other point.
For our purposes, I think it will suffice to say that a “rhizome” is an intrinsically relational structure that gives rise to parts which can be treated as stand-alone entities for certain purposes. Do you see how our example movies database, from way up this document, can be thought of as a single rhizome, where the “shoots” or “stems” are things like Characters, People or Movies? But how each of those entities is actually fully integrated into and connected with all of the others?

Relationality as Rhizomatic

In our history above we described data-lakes as more similar to No-SQL databases than to relational databases - until we start adding in things like our indexes, structured queries and engineering team, at which point the gestalt of all of that starts to look more like a relational database.
The key insight here is that a database is relational when enough of the underlying rhizome is preserved somewhere. The way our three types of datastore - SQL, No-SQL and Data Lake - are differentiated is about how the rhizome is treated.
Three Types of Data Store
Data Store Type
Structure
Rhizome
Notes
1
RDBMS
Structured
Integrated
In a traditional SQL database normalized across tables that rhizome is immanent and extant within the schema of the database. You don’t need any external indexes, you don’t need institutional knowledge, you don’t need specifically tailored queries - the rhizome exists within the structure of the database itself.
2
No-SQL
Unstructured
Discarded
A No-SQL database like CouchDB dispenses with the normalization and skips the relationality - as a result, it loses its rhizomatic aspect. Your data is no longer intelligible as a deeply interconnected network expressing itself as seemingly independent things - you’ve gone ahead and modeled it as actually independent things.
3
Data Lake
Partially Structured
External
A data-lake with an engineering team behind it and modern tooling takes the flexibility of a No-SQL store but reconnects the rhizomatic structure by migrating the relationality out into the indexes, queries and institutional knowledge required to work with it effectively.
No results from filter


Data Lakes - large partially structured pools of disparate collections of data unified at query-time via externally imposed structures - are an attempt to combine the flexibility of a No-SQL database with the rigor and capability of a relational store. The technologies we have here work - but there are large trade-offs. Because the rhizomatic aspect has to be tracked externally we lose some of the innate flexibility of a truly relational structure.
A single cell in a SQL database can be compared to every other cell in every other row in every other table in a deterministic, integrated and crucially fast way. If you know SQL, you are able to jump into any relational database and start doing productive things. In a data lake, however, you need to learn how this specific system works, which relations are managed, where, and how. You need to manage external indexes, and you introduce an entire social domain into your rhizome because these things are partially held together via convention that’s not captured intrinsically in the structure.

Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.