WIP: Reactivity in Distributed Systems

The way we make multi-tier applications can be so much better, if we think carefully about the fundamentals of computation.

How many lines of code does it take you to make a web-based todo-list for personal use, leveraging localStorage and without any authentication? Not much. In fact, there is a whole catalog of doing this with different frameworks.

Now, let me up the ante. How many lines of code will it take you to make a multi-user todo-list application with task sharing across users and real-time synchronization and collaboration? Pick your favorite programming language and framework and tell me the number.

Why is building multi-tier web applications so cumbersome? And why haven't we made progress on making it easier - despite most users living in a social and multi-device world which makes this tablestakes?

After 20 years of making software engineering, I have been annoyed with the amount of schlep involved in building simple webapps. Enough to start looking for better ideas.

Multi-tier apps are distributed systems

Database, app server, client

Reactive programming has won in UI

React, Svelte, Vue - they all arrived at similar solution to their problems.

Reactivity primitives are missing in Distributed Systems

Everyone is solving a special case of the same general problem ad-hoc in multiple domains: UI, database replication, sync engines, LiveViews, games etc

Dan Abramov (React developer) asked a very similar question in his The Two Reacts blog post:

But he did not propose any answers.

💡The problem is the assignment operator

I kept mulling over and even stared at the way we write reactive expressions for a long while. And then I had an "aha!" moment.

The assignment operator is the problem here. It is too blunt. When computation and value live far apart, transmitting the whole value is inefficient.

This is the source of all our troubles.

💡CRDTs work beautifully for merging state

Sync engines are starting to realize this. But they are still looking at as the problem of syncing documents.

What if our model of distributed computation is built out of CRDTs?

What if we annotate all computations and state with the zone it lives in? Further more, what if reactive state's type was a CRDT?

In fact, we can go even a step further. Since functions are just values, we can think of the entire distributed program as a self-modifying CRDT map.

How might a multi-user, multi-tier todo-list app look like?

const todoapp = {
    // Nodes are organized by zones. Each piece of state and computation needs to be located in a specific zone
    // We assume that all nodes in the same zone behave identically regarding partial state, local computation and communication
    server: {
        all_todos: {
            // Type of state-based/op-based/delta-based CRDT to use: Doc, Counter, LWWRegister, AWSet, POLog, Map, List, PeriText etc
            // Different types impose different requirements on nodes and comm channels and have diff performance characteristics and consistency guarantees
            type: "crdt.op.doc", 
            // init is called whenever a new node joins the network in this zone. For eg: a db node might load its state from the disk when it begins.
            // Here we're using a two-level nested map as our in-memory database of tasks: { user_id: { task_id: todo_object } }
            init: () => ({}) 
        }, 
        // computations can only access the local state and must produce results of type that matches the target variable's CRDT type
        // event will include session information (for eg: user_id, device_id, user_role etc) that can be used to implement access control
        filter_for_user: (server, event) => ({path: '/client/my_todos', op: 'replace', value: server.all_todos[event.user_id] || {}}),
        insert_todo: (server, event) => {
            let random_uuid = generate_uuid();
            return {path: `/server/all_todos/{event.user_id}/{random_uuid}`, op: 'insert', value: {task: event.task, status: 'pending'}};
        },
        toggle_todo: (server, event) => ({path: '/server/all_todos/{event.user_id}/{event.todo_id}/status', op: 'rotate'}),
        delete_todo: (server, event) => ({path: '/server/all_todos/{event.user_id}/{event.todo_id}', op: 'delete'}),
        modify_todo: (server, event) => ({path: '/server/all_todos/{event.user_id}/{event.todo_id}/task', op: 'replace', value: event.task}),
    },
    client: {
        // Derived values are computed by functions
        // these will be initialized when this node begins and connects (will pull/push if across zones)
        // If its type is an op-based/delta-based CRDT, then it can also be updated by other functions using ops/deltas
        // Otherwise, full state replacement based assignment is assumed whenever the dependencies change
        // TODO: Where is Reactivity?
        my_todos: {
            type: 'derived.map',
            compute: '/server/filter_for_user'
        },
        ui: {
            type: 'derived.crdt.state.web_ui',
            dependencies: ['/client/my_todos', '/client/task_input'],
            compute: '/client/render'
        },
        task_input: {
            type: "string",
            init: () => ""
        },
        render: (client, event) => ({
            path: '/client/ui',
            op: 'replace',
            value: client.my_todos.map(todo => `<li>${todo.task} - ${todo.status}</li>`) 
                        + "<div><input value={client.task_input} type='text'/><button onclick={insert_todo(task_input)}>Add</button></div>", 
        })
    },
    // Describe communication channels across zones. Types: HTTP, Streaming/SSE, Websocket, WebRTC, P2P etc
    channels: [
        {zones: ['client', 'server'], type: 'websocket'}
    ]
}

It does not need a whole new language

There is a research group that's trying to solve this with linguistic constructs that completely abstract away the location of data and computation. There might be value in this.

But I can get this much more quickly using any existing language. In fact, my model can accommodate different computations being implemented in different programming languages, allowing mixing-and-matching as their individuals strengths.

Conclusion

We can build a much better future.

PreviousWhy governments should be slow and careful with blockchains NextMusic

Last updated 2 months ago