Decomposing web app development

Web applications’ story has been incomplete for a long time. There’s a lot of people working in web development, a lot of effort put into it, a lot of thought (I hope), and still we’re far, far away from complex, evolving, reactive web apps. It’s still the Dark Ages.

Web frameworks approach this problem by solving all problems at once. They mix rendering, state management, server communication and reactivity into one big ball of, khm, software. It’s a complex, hard to control, hard to combine and rarely fit-all-your-needs-perfectly way to live your life. Unless you’re writing a TodoMVC app. Then you have a lot of good options, with perfect documentation and loads of examples.

But there’s no reason it has to be that way. We can get closer to building large, maintainable browser apps by separating concerns and providing solutions for them independently.

Rendering DOM was a big problem with a lot of somewhat-okay-ish solutions to choose from, but then React.js popped up and now, just one year later, React is really, really hard to ignore. Even me, working on server side most of the day, have already published a public praise to it.

Communication between components is still very much unexplored. Core.async is more than fine foundation for it, but usage patterns and best practices have yet to emerge. I know, it is trivial on a small scale, just like connecting plug and socket, but when you have 100 cables to connect, you better wait and see how smart people do it first.

And then there’s an application state. It has been a grey area for a long time, with most frameworks covering it either too aggressive (like Meteor.js), or more as an afterthought. And that’s where DataScript enters.

You always start small, and back then any state management solution seemed like an overkill. You know, “I’ll do fine by just putting this into array…”, “I‘ll create a global variable to store result of this AJAX request”, this kind of attitude. As you grow, this non-uniform, ad-hoc approach to state starts to get in your way. At some point of your life, you will need to query app state in interesting ways. You will need to subscribe for updates not in one model, but in two, three at once, or look for specific pattern in data. You will need rollback. You will need two-ways server sync, failure handling strategy, you will need caching and transactions. Unless you won’t. I mean, that’s a lot of needs, and I have no illusions I can give you one single pill that will address all of them. And it’s not just me, so far nobody succeeded in this area. But it doesn’t mean I can’t help.

If you’re familiar with Om, you may have noticed that main thing it sells is not a React integration. It’s state management solution. Which is, in Om, just an atom (a mutable ref holding pointer to immutable tree) where you put state of your whole app. This thing alone gives you a lot of nice properties: rewind to any point in time, subscribe to state changes, synchronization logic can be done outside of the components and is not their concern. Even rendering is, in fact, decoupled nicely from the state, being just one of many potential listeners to state storage. Which is totally fine and a huge win by all means, the only problem being that your state is rarely a nested Hashmap. You can present your app state as a nested Hashmap, but you’ll soon realize that a rare component depends on a strict subtree of that structure. I mean, I wrote a 200-line Om app and I already faced this issue.

So, if we want to do better (which we do), how do we keep all these nice properties of Om? They come from two simple facts: state is an immutable data structure, and state management is uniform: everything you app cares about is stored in one single place.

DataScript is exactly that: it’s immutable uniform state management solution. You can think of it as a DB (and that’s totally correct because it imitates a server-side DB, Datomic), but very lightweight and pure in-memory. Or, to put it better, it’s an immutable data structure, like a Hashmap, with ability to run non-trivial queries over it. The whole database is an immutable value, and at any point you can take its value, run query over it (no matter if it’s current actual database or a snapshot from 2 weeks ago), pass it to render, put it into array, store it, send over the wire and so on. It then adds a thin layer on top of that which provides atomic mutations and ability to subscribe for data coming in and out of the database.

And that’s it. There’s nothing more to it. It does not do automatic server sync, it does not do lazy loading, it does not persists itself to local storage, it does not do reactive programming. Instead, it’s a foundation. A sound, capable primitive to build storage solution that fits your application’s needs.

The idea of having a database running inside your browser sounds less crazy when you start to think how much state modern client-side application have to deal with. Take GMail, one of the pioneers of rich web applications: it loads a pile of emails organized into threads which are attached to the labels. At each moment, you have up to three simultaneous views into the same dataset to be kept in sync. Stuff like this is most naturally expressed as queries to structured storage.

But browsers are so scarce in resources, you say. That’s why I do not recommend to think of DataScript as of database. In traditional mindset, doing SQL query is a pain. It’s a thing to avoid. For in-memory database, there’s no particular overhead to it. You don’t do networking, you don’t do serialization/deserialization. It’s all comes down to a lookup in data structure. Or series of lookups. Or array iteration. You put little data in it, it’s fast. You put a lot of data, well, at least it has indexes. That should do better than you filtering an array by hand anyway. Yes, you may even get some performance benefits out of it, although it’s not a primary objective. But the thing is really lightweight.

So, here’s DataScript. Check out the repo. I hope this example will motivate other people to build other solutions with different performance characteristics and different usage experience. I’ll definitely be glad to see that. The idea is not to use my library, but to have tons of libraries with intentionally narrow scope and excellent combinability. As an app developer, I want to decompose needs of my app and, for every one of them, choose the best possible solution out there. Maybe I’ll have to write some glue code, but in this perfect world, I’m totally ok with it.

April 24, 2014

Discussion on HackerNews