the graph data model

you write nested. Zef stores flat. the glue between.

the eternal OOP / SQL divorce

In code, we think nested:

class User:
    name: str
    emails: list[str]

alice = User(name='Alice', emails=['a@x', 'a@y'])

In SQL, we must shred it flat — emails can't live in a column, so we make a second table and wire them with foreign keys.

users table emails table ┌──────┬──────┐ ┌─────────┬──────────┐ │ id │ name │ │ user_id │ email │ ├──────┼──────┤ ├─────────┼──────────┤ │ 1 │Alice │ │ 1 │ a@x │ └──────┴──────┘ │ 1 │ a@y │ └─────────┴──────────┘ → N+1 queries, ORM plumbing, migrations every time the shape changes.

zef's answer: write nested, store graph

the grand trick

You write:

ET.User(name='Alice', emails_=['a@x', 'a@y'])

Zef stores:

(User) ──[name]────▶ ("Alice") │ ├──[emails]─────▶ ("a@x") └──[emails]─────▶ ("a@y")

Distinct nodes for the User, for "Alice", for each email. The User node is a bare identity; the edges carry the information.

Mental model rules:

Nesting in code ↔ edges in storage
field=value ↔ (node) --[field]--> value-node
Every value ("Alice", 30, True) becomes its own value-node
Two Users with the same name still point to the same "Alice" value-node

entities vs values

	entity node	value node
role	identity anchor	data carrier
e.g.	`ET.User`, `ET.Page`	`"Alice"`, `42`, `True`
internal structure	none — bare identity	the data
two equal ones are…	still distinct (different identity)	the same node (by value)

This is the Wittgenstein-Tractatus model: "objects are simple" — they carry no intrinsic structure. Structure emerges from the edges between them.

relations carry everything

In Zef, the thing an entity has isn't stored inside it. It's stored on the edge out of it. An edge carries:

(source entity) ──[RT.relation_name]──▶ (target node)

RT is "relation type", the cousin of ET. Same rules apply — you invent relation types on the fly.

the "zero, one, infinity" rule

Physically, every field is a set of outgoing edges. It might have 0, 1, or 37 edges — same storage mechanism.

user | Out[RT.nickname] | collect   # []    no edges
user | Out[RT.name]     | collect   # ['Alice']   one edge
user | Out[RT.email]    | collect   # ['a@x', 'a@y']   two edges

no migrations for cardinality changes

Going from "one name" to "many names" isn't a schema change. The storage is already "many." You just start writing / reading the plural form.

Contrast with SQL: ALTER TABLE users DROP COLUMN name; CREATE TABLE user_names (...); plus a data migration plus ORM retooling.

building a graph

A Graph is a value (like a list). You construct one from entity declarations:

g = Graph([
    ET.Person(
        '🍃-01abc...',
        name='Alice',
        lives_in=ET.City(
            '🍃-02def...',
            name='Berlin',
            population=3_600_000,
        ),
        friend_=[
            ET.Person('🍃-03ghi...', name='Bob'),
            ET.Person('🍃-04jkl...', name='Carol'),
        ],
    ),
])

# commit to the graph store (so we can query it later)
g.add_to_graph_store()

Zef normalizes this tree of declarations into distinct entities + edges. The "Berlin" city is a single node, even if 100 Persons live there. Same for "Alice" as a value — reused across wherever it appears.

reading back

# all Persons in the graph
people = g | all(ET.Person) | collect

# find Alice
alice = g | all(ET.Person) | filter(F.name == 'Alice') | first | collect

# her friends' names
names = alice | Fs.friend | map(F.name) | collect
# {'Bob', 'Carol'}

# her city
city_name = alice | F.lives_in | F.name | collect     # 'Berlin'

graphs are VALUES

A Graph is just another value — like a string or a dict. You can:

Construct one from a list of entity declarations
Store it to disk as a .zef file
Send it over a socket without serialization
Compare two graphs for equality
Build a DB state from a graph (we'll see this in chapter 16)

the SQL mapping, one more time

SQL	Zef
table	entity type (`ET.Person`)
row	entity node
column	relation type (`RT.name`)
cell	target value node
foreign key	edge
NULL	edge absent (empty set)
JOIN	follow the edge

what you lose (and gain)

Lose: compile-time schema enforcement. There's no DDL.

Gain: evolution without migrations. Sparse, optional fields for free. Many-to-many without join tables. Graph traversals that read like English. Refinement types when you do want constraints.

graph-ref anatomy

When you pull an entity from a stored graph, you get a graph reference:

alice_ref = g | all(ET.Person) | first | collect
print(alice_ref)   # ET.Person('🕸️-1-...')

The 🕸️- prefix means "graph-local reference." It points to a specific node in this specific graph. Chain ZefOps to traverse out from it.

putting it all together

g = Graph([
    ET.User('🍃-001a...', name='Alice', age=30,
            email_={'a@x.com', 'a@y.com'}),
    ET.User('🍃-002b...', name='Bob',   age=25),
    ET.User('🍃-003c...', name='Carol', age=42,
            email_={'c@x.com'}),
])
g.add_to_graph_store()

# Which users have zero emails?
g | all(ET.User) | filter(Fs.email | length == 0) | map(F.name) | collect
# ['Bob']

# average age of users with emails
g | all(ET.User)
  | filter(Fs.email | length > 0)
  | map(F.age) | apply({'sum': reduce(add), 'n': length})
  | apply(Z['sum'] / Z['n']) | collect
# 36.0

visualize it

For the graph above, draw the nodes and edges on a napkin. How many distinct value-nodes are there for "Alice", "Bob", "Carol"? (Hint: 3.) How many distinct email value-nodes? (Hint: 3.)

Next up: updating graphs — field= vs field_= vs this+… →