YeSQL? How a Spanner changes the toolbox

web scale?

The world of data continues to evolve. The last explosion was with the push on the NoSQL bandwagon. In the rush to solve the problem of not being able to scale that Oracle DB, many ended up giving up a lot along the way. Maybe, the problem wasn’t SQL after all, it was more dealing with the growth in data. It was one thing to have client server apps that needed data, but once we got to the infamous “web scale” we started to run into problems.

The database crew tried hard to keep us tied to the database. Stored procedures resulted in the SQL DSL becoming more and more complex and divergent (I was on the Sybase side which was much superior to Oracle!! 🙂

Then desperation kicked in and the JVM was put into the database itself to be close to the data.

MySQL and memcached got us a long way, and then there was a split. A lot of the scaling issues were revolving around the desire to normalize your data, and the JOIN hell couldn’t perform. One solution was to denormalize and treat your MySQL bigtable-style and give up on some of the relational concepts in the DB, handling that in the app layer (the opposite direction to stored procs).

Other’s jumped in on “NoSQL”, leaving behind the SQL and relational world. The path to scaling was eventual consistency. Different data models. An explosion of new approaches came into being, which has been exciting even if it has come with some pain (choice, immaturity).

We can unbundle the various trade offs. With Spanner, for example, you can benefit from the ecosystem and tools around SQL, but handle scaling and cheating around CAP theorem. Being able to make the choice to not give up SQL, but get consistency benefits, changes the fundamentals involved in making a choice.

Now, SQL can certainly be frustrating. We have lived through the ORM years, and how much code have you written that marshals between strings and the relational model as we map to an OO model. If you believe in type systems, it can feel like clawing your eyes out when you see code that converts between the boundaries. Whenever you bridge between worlds like this (e.g. JavaScript to native land in React Native) you are holding your nose and valuing the benefits that you get on the other side.

SQLite did such a good job as an embedded database that we ended up getting it on all platforms, including mobile. This is a lil confusing at times. Many of the benefits with SQL are on the backed (OLAP, OLTP, etc) and having to go through the mapping often feels like real overkill. Wouldn’t it be nicer to take your application state, and users data, and just …. save it? Thus we got solutions that allowed you to do just that, and then tackle other problems such as the notion of multiple clients and real-time (e.g. the original Firebase real time database!). For many applications being able to use a tool such as Firebase is a great solution as it is easy to reason with for multiple clients, and you get a rich way to query for more complex use cases.

But, what about other use cases? I was working on a knowledge base system that allowed you to store your own sets of knowledge, and also share this data with other users. This predated noms (from the amazing Aaron Boodman and team) that gives you “git for data” feel. Our solution was more of a shared graph, so it naturally fit with a graph database. At first we used Neo4j, and then GraphQL came along which gives you an interface on top of various stores.

We needed the following:

  • The user will have data that needs to be sync’d across all of their devices and surfaces
  • The data can be connected to multiple users
  • Users could shared edit at times
  • Users could also fork data (and merge) to allow for differing content without siloing forever
  • Have as much data as possible on device so it works offline.

As soon as the system is big you get into obvious conditions. Most of the time “your own” data could always be available on device. But the shared knowledge graph was often too large. We had to come up with good strategies to sync a subset of the overall graph. When running searches we needed to run multiples queries: on locally to get you results asap, and another to the backend to gather others that were out of our local scope.

This resulted in a bunch of infrastructure work, and I am still on the lookout for better solutions (I would love to hear your thoughts!). I really enjoyed using a graph database in this context as it truly fit.

As we build offline first, there is still room for solutions that truly nail the experience for client developers, and still give you the data you need on the backend. I am excited to see the energy in the database market these days and how it nicely maps into the reactive world where data changes cause UI updates. The data change can stream in from a user interaction, or from the backend, and it doesn’t matter!

NoSQL is still evolving. Graph databases are a new hotness. And we keep seeing new entrants but I am also really excited to see SQL itself getting a shot in the arm with Spanner. I sit back and imagine SQL on the client and server with a sync solution in between and wonder…….

Leave a comment

Your email address will not be published. Required fields are marked *