At its core, the philosophy of shared nothing is really just the application of loose coupling to the entire software stack. This architecture arose in direct response to what was at the time the prevailing architecture: a monolithic Web application server that encapsulates the language, database, and Web server — even parts of the operating system — into a single process (e.g., Java).
When it comes time to scale, this can be a major problem; it’s nearly impossible to split the work of a monolithic process across many different physical machines, so monolithic applications require enormously powerful servers. These servers, of course, cost tens or even hundreds of thousands of dollars, putting large-scale Web sites out of the reach of cash-hungry individuals and small companies.
What the LAMP community noticed, however, was that if you broke each piece of the Web stack up into individual components, you could easily start with an inexpensive server and simply add more inexpensive servers as you grew. If your $3,000 database server couldn’t handle the load, you’d simply buy a second (or third, or fourth) until it could. If you needed more storage capacity, you’d add an NFS server.
For this to work, though, Web applications had to stop assuming that the same server would handle each request — or even each part of a single request. In a large-scale LAMP (and Django) deployment, as many as half a dozen servers might be involved in handling a single page! The repercussions of this are numerous, but they boil down to these points:
- State cannot be saved locally. In other words, any data that must be available between multiple requests must be stored in some sort of persistent storage like the database or a centralized cache.
- Software cannot assume that resources are local. For example, the Web platform cannot assume that the database runs on the same server; it must be capable of connecting to a remote database server.
- Each piece of the stack must be easily moved or replicated. If Apache for some reason doesn’t work for a given deployment, you should be able to swap it out for another server with a minimum of fuss. Or, on a hardware level, if a Web server fails, you should be able to replace it with another physical box with minimum downtime. Remember, this whole philosophy is based around deployment on cheap, commodity hardware. Failure of individual machines is to be expected. As you’ve probably come to expect, Django handles this more or less transparently — no part of Django violates these principles — but knowing the philosophy helps when it comes time to scale up.
But Does It Work? – This philosophy might sound good on paper (or on your screen), but does it actually work? Well, instead of answering directly, let’s instead look at a by-no-means-complete list of a few companies that have based their business on this architecture. You might recognize some of these names:
- Amazon
- Blogger
- Craigslist
- LiveJournal
- Slashdot
- Wikipedia
- Yahoo
- YouTube