Scaling

Now that you know how to get Django running on a single server, let’s look at how you can scale out a Django installation. This section walks through how a site might scale from a single server to a large-scale cluster that could serve millions of hits an hour.

It’s important to note, however, that nearly every large site is large in different ways, so scaling is anything but a one-size-fits-all operation. The following coverage should suffice to show the general principle, and whenever possible we’ll try to point out where different choices could be made. First off, we’ll make a pretty big assumption and exclusively talk about scaling under Apache and mod_python. Though we know of a number of successful medium- to large-scale FastCGI deployments, we’re much more familiar with Apache.

Running on a Single Server – Most sites start out running on a single server, with an architecture that looks something like Figure 20-1.

image029

Figure 20-1: a single server Django setup.

 

This works just fine for small- to medium-sized sites, and it’s relatively cheap — you can put together a single-server site designed for Django for well under $3,000. However, as traffic increases you’ll quickly run into resource contention between the different pieces of software. Database servers and Web servers love to have the entire server to themselves, so when run on the same server they often end up “fighting” over the same resources (RAM, CPU) that they’d prefer to monopolize. This is solved easily by moving the database server to a second machine.

Separating Out the Database Server

As far as Django is concerned, the process of separating out the database server is extremely easy: you’ll simply need to change the DATABASE_HOST setting to the IP or DNS name of your database server. It’s probably a good idea to use the IP if at all possible, as relying on DNS for the connection between your Web server and database server isn’t recommended. With a separate database server, our architecture now looks like Figure 13-2.

 

Here we’re starting to move into what’s usually called n-tier architecture. Don’t be scared by the buzzword – it just refers to the fact that different tiers of the Web stack get separated out onto different physical machines.

 At this point, if you anticipate ever needing to grow beyond a single database server, it’s probably a good idea to start thinking about connection pooling and/or database replication. Unfortunately, there’s not nearly enough space to do those topics justice in this book, so you’ll need to consult your database’s documentation and/or community for more information.

 

image031

Figure : Moving the database onto a dedicated server.

Running a Separate Media Server

We still have a big problem left over from the single-server setup: the serving of media from the same box that handles dynamic content. Those two activities perform best under different circumstances, and by smashing them together on the same box you end up with neither performing particularly well. So the next step is to separate out the media — that is, anything not generated by a Django view — onto a dedicated server.

image033

Figure 20-3: Separating out the media server.

 

Ideally, this media server should run a stripped-down Web server optimized for static media delivery. lighttpd and tux (http://www.djangoproject.com/r/tux/) are both excellent choices here, but a heavily stripped down Apache could work, too. For sites heavy in static content (photos, videos, etc.), moving to a separate media server is doubly important and should likely be the first step in scaling up.

This step can be slightly tricky, however. Django’s admin needs to be able to write uploaded media to the media server (the MEDIA_ROOT setting controls where this media is written). If media lives on another server, however, you’ll need to arrange a way for that write to happen across the network.

Implementing Load Balancing and Redundancy

At this point, we’ve broken things down as much as possible. This three-server setup should handle a very large amount of traffic — we served around 10 million hits a day from an architecture of this sort — so if you grow further, you’ll need to start adding redundancy.

This is a good thing, actually. One glance at Figure 20-3 shows you that if even a single one of your three servers fails, you’ll bring down your entire site. So as you add redundant servers, not only do you increase capacity, but you also increase reliability.

It’s relatively easy to get multiple copies of a Django site running on different hardware – just copy all the code onto multiple machines, and start Apache on all of them. However, you’ll need another piece of software to distribute traffic over your multiple servers: a load balancer.

 You can buy expensive and proprietary hardware load balancers, but there are a few high-quality open source software load balancers out there. Apache’s mod_proxy is one option, but we’ve found Perlbal to be fantastic. It’s a load balancer and reverse proxy written by the same folks who wrote memcached.

With the Web servers now clustered, our evolving architecture starts to look more complex, as shown in Figure 20-4.

image035

Figure 20-4: A load-balanced, redundant server setup.

 

Notice that in the diagram the Web servers are referred to as a “cluster” to indicate that the number of servers is basically variable. Once you have a load balancer out front, you can easily add and remove back-end Web servers without a second of downtime.

Going Big

At this point, the next few steps are pretty much derivatives of the last one:

  • As you need more database performance, you’ll need to add replicated database servers. MySQL includes built-in replication; PostgreSQL users should look into Slony (http://www.djangoproject.com/r/slony/) and pgpool (http://www.djangoproject.com/r/pgpool/) for replication and connection pooling, respectively.
  • If the single load balancer isn’t enough, you can add more load balancer machines out front and distribute among them using round-robin DNS.
  • If a single media server doesn’t suffice, you can add more media servers and distribute the load with your load-balancing cluster.
  • If you need more cache storage, you can add dedicated cache servers.
  • At any stage, if a cluster isn’t performing well, you can add more servers to the cluster.
  • After a few of these iterations, a large-scale architecture might look like Figure 20-5.

image037

Figure 20-5. An example large-scale Django setup.

 

Though we’ve shown only two or three servers at each level, there’s no fundamental limit to how many you can add.

Back to Tutorial

Using Tomcat libraries With Maven
Performance Tuning

Get industry recognized certification – Contact us

keyboard_arrow_up
Open chat
Need help?
Hello 👋
Can we help you?