workers & isolation

spawn a whole process when a thread isn't enough.

actors vs workers

	actors	workers
unit	Tokio task (in same process)	Python process (separate)
isolation	shared memory	OS-level separation
fault tolerance	crash = uncaught panic	crash = process exits, main survives
overhead	~100 ns/msg	~ms for IPC
GIL	Python on hot path → GIL-bound	each process has its own GIL
use for	concurrency, state machines	heavy CPU, untrusted code, crash containment

the basic shape

worker = FX.StartWorkerProcess() | run

result = FX.SendCommandToWorkerProcess(
    process=worker,
    command=FX.Print(content='hello from worker!'),
) | run

FX.StopWorkerProcess(process=worker) | run

The command is an FX value. The worker receives it, runs it, sends back the result. All over a binary protocol — no JSON, no serialization tax.

worker = a python with a zef runtime

main process worker process ──────────── ────────────── FX.Start... ───spawn──▶ (zef runtime) FX.SendCommand ───bytes──▶ run command ◀──bytes── send result FX.Stop... ─────────▶ exit

The worker is a full-fledged Python process that happens to have the Zef runtime loaded. Send it FX values; it runs them.

why it matters

fault containment

A segfault in the worker doesn't take down the main process. You get the crash info and can spawn a new worker.

worker = FX.StartWorkerProcess() | run

try:
    result = FX.SendCommandToWorkerProcess(
        process=worker,
        command=potentially_dangerous_fx,
    ) | run
except WorkerCrash as e:
    print(f'worker died: {e}')
    worker = FX.StartWorkerProcess() | run   # respawn

parallel CPU-bound work

Many workers in parallel. Because each has its own GIL, they actually run in parallel.

workers = [FX.StartWorkerProcess() | run for _ in range(4)]

chunks = split_data_into_chunks(dataset, 4)

results = [
    FX.SendCommandToWorkerProcess(
        process=w,
        command=process_chunk(chunk),   # a zef_function returning an FX
    ) | run
    for w, chunk in zip(workers, chunks)
]

untrusted code sandbox

Run code from users (plugins, user scripts) in a separate process with a restricted FX whitelist. (This is a planned/early-stage feature — check current capabilities.)

listing / stopping

# how many workers are alive?
FX.ListWorkerProcesses() | run

# stop one
FX.StopWorkerProcess(process=worker) | run

sending any FX value

The command can be any Zef value — typically an FX or a list of them:

FX.SendCommandToWorkerProcess(
    process=worker,
    command=[
        FX.Print(content='start'),
        FX.HTTPRequest(url='https://api.example.com/big-job'),
        FX.Print(content='done'),
    ],
) | run

when NOT to use workers

For high-frequency coordination → use actors (workers have ~ms overhead)
For simple concurrent I/O → actors are plenty
For sharing large memory → the IPC will dominate

Use workers when you specifically need isolation. For everything else, actors are faster and simpler.

a concrete example — image processing

@zef_function
def resize_image(img_bytes: Bytes) -> Bytes:
    from PIL import Image
    import io
    im = Image.open(io.BytesIO(bytes(img_bytes)))
    im.thumbnail((200, 200))
    buf = io.BytesIO()
    im.save(buf, 'JPEG')
    return buf.getvalue()

# in main process — spawn a worker pool
POOL = [FX.StartWorkerProcess() | run for _ in range(4)]

def resize_batch(imgs):
    return [
        FX.SendCommandToWorkerProcess(
            process=POOL[i % len(POOL)],
            command=resize_image(img),
        ) | run
        for i, img in enumerate(imgs)
    ]

CPU-bound image resizing, 4x parallel, each in its own process. If PIL crashes on some weird jpeg, one worker dies and you restart it; the main process keeps serving other images.

remote workers — the next tier

Zef also supports workers on other machines via Remote Containers:

FX.InstallPodmanRemote(ip_address='5.223.58.246') | run
FX.BuildZefImageRemote(ip_address='5.223.58.246', wheel='...whl') | run
FX.RunZefContainerRemote(ip_address='5.223.58.246', session=True) | run

Same programming model: send commands, receive results. Just happens over the network.

effects as transport-independent values

Because effects are data, the same FX.* value can be run locally, in a worker, on another machine, or stored for later. The runtime picks the transport. You write the effect once.

the three-tier model

┌─────────────────────────────────┐ TIER 1 │ pipelines & pure ops │ ← your ordinary code │ fast, shared memory, no GIL │ └─────────────────────────────────┘ ┌─────────────────────────────────┐ TIER 2 │ actors │ ← concurrent workers │ Tokio tasks, msg passing │ same process └─────────────────────────────────┘ ┌─────────────────────────────────┐ TIER 3 │ worker processes │ ← isolation, parallelism │ OS-level separation │ different processes │ (local or remote) │ (possibly different machines) └─────────────────────────────────┘

Use the lightest tier that gets the job done. Jump to a heavier tier only when you need what it specifically provides.

reflection

Which tier would you use for each of these?

Parsing 1,000,000 JSON strings
Running untrusted user-submitted Python
A chatroom with 100 clients
Computing PI to 1M digits on 8 cores

answers

Tier 1 — pure ops in a pipeline
Tier 3 — untrusted = needs isolation
Tier 2 — stateful per-client actors
Tier 3 — GIL contention without processes

Next up: the finale — an end-to-end LLM chat agent. →