Understanding the “NoSQL movement”
This is a fundamentally sound idea and can certainly help, but (depending on your particular application) it may prove less than ideal. If there isn’t much crossover between different users it will work fine, but once there is your application becomes more and more complex, and you’ll end up reading from and writing to both servers anyway.
Adding more servers, for example, to split up the alphabet across four instead of two will require modifying your access code, and migrating all the data will potentially require significant downtime. When you reach this stage you have to start becoming really clever.
The next step for many people is to partition the data vertically as well, which means splitting the individual data items you need to store into different portions and using a different server for each part.
Diagram 3 shows an example in which a website permits its users both to post status updates and upload images, and there’s a separate master database server for each of these different tasks and data types, rather than holding all the data in a single database.
Again, this sort of architecture will certainly help, but often you’ll still need one database to contain “master data” that needs to be shared across all the others – for example, usernames and passwords in this case – and once that server becomes overloaded, you start to seriously run out of options.
Plenty of incredibly smart people have been thinking about this problem for a long time, and a number of alternatives have emerged, many of them built around the concept of an “eventually consistent” data store.
The idea here is that you have several instances of the whole database, which are often distributed geographically so that one is in the UK, one on the West Coast of the US, one on the East Coast and so on. Both reads and writes can come from and go to any one of these stores, and over time – and hopefully, not too much time – any changes made to one are propagated to all the others.
This doesn’t have to happen absolutely immediately, though, hence the term “eventually” consistent.
At first glance this doesn’t look like a good idea, because surely you want all the data stores to contain identical information all the time, don’t you? In practice, though, it’s a compelling solution in many application areas. First of all, “eventually” doesn’t actually need to mean “in several days”, and usually a change to any piece of data would be propagated in a relatively short space of time – typically, less than a second.
For many commercial applications, not having new or changed data become available absolutely instantly isn’t a problem. Consider an application such as Facebook: if Alice changes her profile by writing to one of the data stores, and Bob doesn’t see that change for a second or two, is that the end of the world? Not really. The change will be propagated to the data store Bob is reading from pretty quickly, and certainly by the next time he refreshes the page he’ll see her updated status.
On the other hand, you certainly couldn’t employ such a system for an online banking website, where those few seconds of delay might completely invalidate a transaction. But for the majority of social networking and e-commerce applications out there, it’s good enough.
So what’s this “NoSQL” idea? Well it stems from the fact that most of the eventually consistent solutions out there don’t actually use SQL – Structured Query Language, used by relational database management systems to manipulate data – at all.