Caching load balancing and optimization
Caching
There are different ways to make your website faster: specialized plugins to cache entire rendered HTML pages, plugins to cache all SQL queries and data objects, plugins to minimize JavaScript and CSS files and even some server-side solutions.
But even if you use such plugins, using internal caching methods for objects and database results is a good development practice, so that your plugin doesn’t depend on which cache plugins the end user has. Your plugin needs to be fast on its own, not depending on other plugins to do the dirty work. And if you think you need to write your own cache handling code, you are wrong. WordPress comes with everything you need to quickly implement varying degrees of data caching. Just identify the parts of your code to benefit from optimization, and choose a type of caching.
WordPress implements two different caching methods:
- Non-persistent - The data remains in the cache during the loading of the page. (WordPress uses this to cache most database query results.)
- Persistent - This depends on the database to work, and cached data can auto-expire after some time. (WordPress uses this to cache RSS feeds, update checks, etc.)
Non-Persistent Cache
When you use functions such as get_posts() or get_post_meta(), WordPress first checks to see whether the data you require is cached. If it is, then you will get data from the cache; if not, then a database query is run to get the data. Once the data is retrieved, it is also cached. A non-persistent cache is recommended for database results that might be reused during the creation of a page.
The code for WordPress’ internal non-persistent cache is located in the cache.php file in the wp-includes directory, and it is handled by the WP_Object_Cache class. We need to use two basic functions: wp_cache_set() and wp_cache_get(), along with the additional functions wp_cache_add(), wp_cache_replace(), wp_cache_flush() and wp_cache_delete(). Cached storage is organized into groups, and each entry needs its own unique key. To avoid mixing with WordPress’ default data, using your own unique group names is best.
Example
For this example, we will a create function named d4p_get_all_post_meta(), which will retrieve all meta data associated with a post. This first version doesn’t involve caching.
01 function d4p_get_all_post_meta($post_id) {
02 global $wpdb;
03
04 $data = array();
05 $raw = $wpdb->get_results( "SELECT meta_key, meta_value FROM $wpdb->postmeta WHERE post_id = $post_id", ARRAY_A );
06
07 foreach ( $raw as $row ) {
08 $data[$row['meta_key']][] = $row['meta_value'];
09 }
10
11 return $data;
12 }
Every time you call this function for the same post ID, an SQL query will be executed. Here is the modified function that uses WordPress’ non-persistent cache:
01 function d4p_get_all_post_meta($post_id) {
02 global $wpdb;
03
04 if ( ! $data = wp_cache_get( $post_id, 'd4p_post_meta' ) ) {
05 $data = array();
06 $raw = $wpdb->get_results( "SELECT meta_key, meta_value FROM $wpdb->postmeta WHERE post_id = $post_id", ARRAY_A );
07
08 foreach ( $raw as $row ) {
09 $data[$row['meta_key']][] = $row['meta_value'];
10 }
11
12 wp_cache_add( $post_id, $data, 'd4p_post_meta' );
13 }
14
15 return $data;
16 }
Here, we are using a cache group named d4p_post_meta, and post_id is the key. With this function, we first check to see whether we need any data from the cache (line 4). If not, we run the normal code to get the data and then add it to the cache in line 13. So, if you call this function more than once, only the first one will run SQL queries; all other calls will get data from the cache. We are using the wp_cache_add function here, so if the key-group combination already exists in the store, it will not be replaced. Compare this with wp_cache_set, which will always overwrite an existing value without checking.
As you can see, we’ve made just a small change to the existing code but potentially saved a lot of repeated database calls during the page’s loading.
Notes
- Non-persistent cache is available only during the loading of the current page; once the next page loads, it will be blank once again.
- The storage size is limited by the total available memory for PHP on the server. Do not store large data sets, or you might end up with an “Out of memory” message.
- Using this type of cache makes sense only for operations repeated more than once in the creation of a page.
- It works with WordPress since version 2.0.
Database-Driven Temporarily Persistent Cache
This type of cache relies on a feature built into WordPress called the Transient API. Transients are stored in the database (similar to most WordPress settings, in the wp_options table). Transients need two records in the database: one to store the expiration time and one to store the data. When cached data is requested, WordPress checks the timestamp and does one of two things. If the expiration time has passed, WordPress removes the data and returns false as a result. If the data has not expired, another query is run to retrieve it. The good thing about this method is that the cache persists even after the page has loaded, and it can be used for other pages for as long as the transient’s expiration time has not passed.
If your database queries are complex and/or produce results that might not change often, then storing them in the transient cache is a good idea. This is an excellent solution for most widgets, menus and other page elements.
Example
Let’s say we wanted an SQL query to retrieve 20 posts from the previous month, along with some basic author data such as name, email address and URL. But we want posts from only the top 10 authors (sorted by their total number of posts in that month). The results will be displayed in a widget.
When tested on my local machine, this SQL query took 0.1710 seconds to run. If we had 1000 page views per day, this one query would take 171 seconds every 24 hours, or 5130 seconds per month. Relatively speaking, that is not much time, but we could do much better by using the transient cache to store these results with an expiration time of 30 days. Because the results of this query will not change during the month, the transient cache is a great way to optimize resources.
Returning to my local machine, the improved SQL query to get data from the transient cache is now only 0.0006 seconds, or 18 seconds per month. The advantage of this method is obvious in this case: we’ve saved 85 minutes each month with this one widget. Not bad at all. There are cases in which you could save much, much more (such as with very complex menus). More complex SQL queries or operations would further optimize resources.
Let’s look at the actual code, both before and after implementing the transient cache. Below is the normal function to get the data. In this example, the SQL query is empty (because it is long and would take too much space here), but the entire widget is linked to at the end of this article.
1 function d4p_get_query_results() {
2 global $wpdb;
3
4 $data = $wpdb->get_results(' // SQL query // ');
5
6 return $data;
7 }
And here is the function using the transient cache, with a few extra lines to check whether the data is cached.
01 function d4p_get_query_results() {
02 global $wpdb;
03
04 $data = get_transient('my_transient_key');
05
06 if ($data === false) {
07 $data = $wpdb->get_results(' // SQL query // ');
08 set_transient('my_transient_key', $data, 3600 * 24);
09 }
10
11 return $data;
12 }
The function get_transient (or get_site_transient for a network) needs a name for the transient record key. If the key is not found or the record has expired, then the function will return false. To add a new transient cache record, you will need the record key, the object with the data and the expiration time (in seconds), and you will need to use the set_transient function (or set_site_transient for a network).
If your data changes, you can remove it from the cache. You will need the record key and the delete_transient function (or delete_site_transient for a network). In this example, if the post in the cache is deleted or changed in some way, you could delete the cache record with this:
1 delete_transient('my_transient_key');
Notes
- The theoretical maximum size of data you can store in this way is 4 GB. But usually you would keep much smaller amounts of data in transient (up to couple of MB).
- Use this method only for data (or operations) that do not change often, and set the expiration time to match the cycle of data changes.
- In effect, you are using it to render results that are generated through a series of database queries and storing the resulting HTML in the cache.
- The name of the transient record may not be longer than 45 characters, or 40 characters for “site transients” (used with multi-sites to store data at the network level).
- It works with WordPress since version 3.0.
Load Balancing
A single server can only handle so many connections. After a certain point it just can’t handle any more users, and the website starts to slow down or even become unreachable. A load balancer is a device that can spread those visitors over a bunch of different servers, allowing the “load” of those visitors and their requests to be “balanced” over a number of different devices. Especially with the Rackspace Cloud, creating more servers to handle increased load is remarkably simple. Even if you don’t have that much traffic to worry about, in the event that your server crashes a load balancer will let you spin up new servers and immediately start sending traffic their way instead of having to wait for the DNS records to update (which could take 24 hours to 48 hours).
Simple analogy: a single server is like a single twig and will easily break, but a load balancer allows you to tie a whole bunch of twigs together to make a stronger solution.
Before we get started, here’s a quick overview of the steps we’re going to take:
1. Spin up and configure a complete WordPress server.
2. Duplicate the WordPress server and set up as SQL server.
3. Force WordPress server to use remote SQL server.
4. Configure cloud storage solution for images and uploaded files.
5. Configure Load Balancer and DNS records.
1. Spin up and configure a complete WordPress server
The easiest way to configure the load balanced system is to start with a complete and functioning server with everything running just the way you like it, and then slice it up from there. So the first step is to spin up a new server. Windows, Linux, whatever — it doesn’t really matter. But for the purposes of this walk-through we’re going to be using Fedora Linux 16. Rackspace already has a great article about how to set up a cloud server instance, so I’m not going to go over that again.
Once you have your server set up you need to have the following things running:
- Webserver (IIS / Apache)
- SQL Server (Microsoft SQL / MySQL)
- PHP
Personally, I’m an Apache/MySQL/PHP kind of guy as they’re much easier to configure for me, but pick your own poison. Again, Rackspace has some good articles on how to do it or you can scour Google. I also like to slap PHPMyAdmin on there because it makes administrating MySQL a whole lot easier.
At this point you should have everything squared away on your server — MySQL up and running, WordPress installed and properly configured, firewall allowing HTTP access and the operating system fully updated. If you have a domain name for the WordPress site you should point it to the server now and make sure everything is working using the domain instead of the IPs — this will save you some trouble down the road. Go ahead and take a quick image of this server (another Rackspace article there) so that if everything goes sideways you still have something to work with.
2. Duplicate the WordPress server and set up as SQL server
Now we need to slice out the MySQL server and set it running on a standalone system.
WordPress uses the SQL database to store just about everything, including the text of your posts and the comments. When you have a single server it makes sense to have the SQL server on the same box, but when you have multiple servers running the same website the only way to have them all running and showing the exact same content simultaneously is to have a single central SQL server with the “master” copy of the database. You’ll still need to replicate the uploaded files (pictures and so forth) across the different web servers, though.
So, now that everything is running you will need to spin up a new server to act as the central SQL server. Take an image of the running server that you have and spin up a new one from that image (see article here).
Now that you have an identical server running we need to slice some things out to slim it down and improve performance, as well as allow other servers to use it.
1. Set the permissions on the MySQL Database for your WordPress installation to allow connections from anywhere and not just the local system. This is best accomplished using the “Permissions” tab in PHPMyAdmin.
2. Disable Apache on the system. Don’t uninstall it (as you’ll need it for the PHPMyAdmin system), simply disable it. Use this command: sudo service httpsd stop
We also need to open the firewall to allow MySQL connections (port 3306). This traffic should never leave the Rackspace network, but since it is still technically a public connection you should still pick a good username / password combination to protect the server. Open the iptables config file in a text editor (sudo vi /etc/sysconfig/iptables) and pop the following line at the top of the config:
-A INPUT -p tcp –dport mysql -j ACCEPT
Save the file and reboot the server for good measure.
3. Force WordPress server to use remote SQL server
You now have one working COMPLETE server, and one working SQL-only server. The next step is to tie the two together.
WordPress uses a file called “wp-config.php” to keep track of the MySQL connection information. The file is stored in the root directory of the WordPress installation. Since we created the SQL server based off the existing running complete server, we can assume that the username and password combination in this file is correct and the rest of the database is compatible. The only thing we need to change is the address of the server.
When we’re picking the IP address we want to connect to, we have a choice. We can either use the public IP of the server (listed on the Rackspace Cloud control panel page for the server) or we can find the private IP of the server and use that instead. The private IP address will force the connection to route over Rackspace’s internal network instead of going over the public Internet and so makes the connection less likely to be sniffed (especially if the servers are in the same datacenter). You can find the private IP address of the SQL server by SSHing into the computer and running “ifconfig.” The private IP will usually be a 10.x.x.x address on eth1.
Here’s the corrected line you need to update in wp-config.php using either the public or private IP:
define(‘DB_HOST’, ’10.179.98.192);
Where 10.179.98.192 is the IP address of your SQL server. If you have a DNS record for the SQL server you can slap that in there instead and pop some single quotes around it. Like so:
define(‘DB_HOST’, ‘sql20.notaserver.net’);
There are a couple errors that could arise once you make the change to the WordPress configuration, and thanks to my own idiotic stumblings I’ve experienced most of them. Thankfully I also have a solution. So, if you get the following errors on your WordPress site when trying to load it after making the switch, this is what’s going on:
Unable to establish a connection… : The WordPress installation is trying to connect to the newly designated MySQL server, but can’t get it to work. This specific error is indicative of the network connection itself being unable to make it between the servers. Ensure that the IP address or domain name of the server that you specified in the wp-config.php file is correct and responding to pings from the web server itself. Also ensure that the firewall on the SQL server is allowing the traffic.
HTTP 500 Error: Also known as the “internal server error,” this indicates that the connection to the SQL server is up and working, but something else is amiss. Make sure that the account (username and password combination) you’re using in the wp-config.php file is granted the right permissions on the SQL server for the database for the WordPress tables, and also that the account is allowed to be accessed from any IP address and not just the local host.
The site isn’t the latest version: If you spun up the SQL server instance using an older image of your single complete server (for example, if you’ve published new stuff since then) you might not have all of the articles in the SQL server’s database. You can either type them in on the new system or spin up and configure a new SQL server from an image of the current server being sure to tell your writers to hold their horses for a bit.
At this point you should be looking at one server running Apache and PHP with the WordPress files on it, and one server with MySQL and the WordPress content on it, and they should be working together to serve the website.
4. Configure cloud storage solution for images and uploaded files
The next issue is with the images and other files associated with your WordPress site. In a normal configuration all of the media files are stored on the webserver itself, but because we now have multiple webservers running it’s very hard to duplicate the files across both servers efficiently. Thankfully Rackspace’s Cloud Files solution offers a great alternative storage method for all the images you want to post on your blog, and a WordPress plugin is available to do all of the hard work for you. CDN Tools is a third party plugin that will automatically upload all of your files to the Rackspace Cloud Files service and do the magic to make them appear in your posts. An added benefit is that because the images are being served from a content delivery network — instead of the server itself — it reduces the load on your webservers and makes the site load faster.
5. Configure Load Balancer and DNS records
Now we just need to set up the load balancer.
Think of your server configuration as a straight line. On one end you have the Internet, where your readers will come from to visit the site. The first thing they should hit on their way into your network is the load balancer, which will direct their traffic to the appropriate server based on some parameters we’ll set up. Next in line are the web servers, the dumb systems that simply serve static HTML or PHP pages and respond on port 80. Those systems talk to the “brains” of the operation, which is last in line, the SQL server that listens on port 3306. The SQL server sends the webservers whatever content the reader wants to see; the webservers spice it up and make a presentable web page based on the WordPress files; and the load balancer remembers where the traffic came from and sends it right back to them. Now that we have the infrastructure spun up to handle a load balancer, we need to spin up the load balancer itself.
Rackspace has a great article on how to spin up a load balancer, and you should follow it to configure the LB. Set the webserver we’ve spun up (the one with Apache still running) as the only node behind the LB for now.
With the load balancer up and running you now have a complete solution. Using the public IP address of the load balancer you can change your DNS records to point your various domains to that LB. Remember to setup Apache with a virtual server in the configuration files of the SERVERS (nodes) for each domain name you want to point towards your load balancer, as the LB will blindly pass the traffic along without altering it and allow you to run as many sites off that one configuration as your heart desires.
At this point you should take an image of the webserver (the node) that is running behind the load balancer. If you see your server starting to slow down, all you have to do is spin up a new instance of that server and designate it as a node in the load balancer’s configuration. Traffic should be balanced between the servers automatically and your site will remain up and responding.
Optimization
This is the most common type of hosting. Your site will be hosted on a server along with many others. The hosting company manage the web server for you, so you have very little control over server settings and so on. The areas most relevant to this type of hosting are:
Caching
WordPress Performance
Other areas which may be of interest include:
Offloading
Virtual / Dedicated Server
In this hosting scenario you have control over your own server. The server might be a dedicated piece of hardware or one of many virtual servers sharing the same physical hardware. The key thing is, you have control over the server settings. In addition to the areas above (caching and WordPress performance, the key areas of interest here are:
Server Optimization
Other areas which may be of interest include:
Offloading
Multiple Servers
Once you're dealing with very high traffic situations it may be necessary to employ multiple servers. If you're at this level, you should already have employed all of the applicable techniques listed above.
The WordPress database can be easily moved to a different server and only requires a small change to the config file. Likewise images and other static files can be moved to alternative servers (see offloading).
If you're employing multiple database servers, the HyperDB class provides a drop-in replacement for the standard WPDB class and can handle multiple database servers in both replicated and partitioned structures.
WordPress MU Optimization
Many of the server-side techniques discussed here also apply to WordPress MU.
This is a quick list of the key areas in which WordPress can be optimized. The biggest gain for the least effort comes from caching.
-
Server Optimization
- DNS onto a separate server
- Web Server optimization
- PHP acceleration / optimization
- MySQL tweaking (query cache, etc)
-
WordPress Performance
- Remove unused plugins
- Optimize plugins
- Optimize themes, hardcode static vars, etc
-
Offloading
- Offload static files to separate server
- Optimized web servers like publicfile, lighttpsd, etc
- Offload feed traffic
-
Caching
- WordPress caching (application level)
- Browser caching (client side)
- Web server caching (server side)
- Adding Database Servers