A Simple PHP HTTP Object Cache

As the result of some work on a web service client for a major American etailer over the summer of 2005 I had need of a simple HTTP object cache to improve performance for clients and reduce the load on the server. The obvious choice is something like the PEAR Cache_lite module but that is quite a lot of code for the limited functionality that the project required.

This paper highlights the essential elements of that work.

Why Cache?

A cache is basically a faster loading copy of an object. It loads faster either because it is physically closer to the client reducing network latency or because it is located in a faster access medium. Caches are used extensively on the Internet. Browsers cache HTML and image data for individual users. Proxies cache frequently referenced data for a pool of users and specialized caches, such as the Squid  web proxy cache, are used as a front end to web servers to speed up page fetching. This is particularly useful where the objects are generated dynamically, such as Wiki pages, but do not change regularly. Generating dynamic pages requires code to be executed and this may pull in information from an RDBMS or even from other, remote, servers via Web Services.

Caching can also be implemented in the code that generates pages. The procedure is basically one of:

  • check if page is in cache
  • generate page if necessary
  • return page to user
  • store page locally

The caching process has the advantage of bypassing all the page generation processing, speeding up access to frequently used resources but at the expense of adding some more code and duplicating the data stored on the server. There may also be problems created by serving stale data to the end user.

The storage problem can be somewhat improved by compressing the data before it is saved in the cache. This also means that for browsers that support decompression directly the compressed copy of the object can be returned directly to the end user. This reduces the amount of data transferred. Compression is trade-off of CPU power against network bandwidth and storage. As most web hosting companies only charge for bandwidth and storage - CPU cycles being viewed as too cheap to meter - this is a good trade-off for most people.

The Code

The code excerpts contain the essentials of a workable caching system with the exception of garbage collection. Garbage collection is needed as real objects may change location so eventually the cache would fill up with stale objects. A sweeper thread could be run periodically to remove objects that are older than the cache storage time.

The first step is to check that the cache directory exists and is writable

if (!is_writable($cache_path)) {
echo '<p style="background: #ff3939; font-weight: 600; color: yellow;">WARNING: The cache directory   located at ' . $cache_path . ' needs to have the permissions set to read/write/execute for everyone in order to work properly.</p>';
}
 

The URL of the requested object now needs to be split into a directory part and a file part. On most PHP content management systems the URL doesn't correspond to a real object on disk but to a location in a database, for example:

http://mydomain.com/wiki/people/sportsmen/donbradley

may correspond to the program

http://mydomain.com/wiki.php

and the location

people/sportsmen/donbradley

would be used by the program to access the information about Don Bradley in the people and sportsmen tables of a database. It is convienient to use this information for our cache location:

$cachedir;       // ie. cache/people/sportsmen
$cacheobject;    // ie. donbradley

we now create the cache directory using hte mmkdir function, this basically creates a directory tree by recursive calls to the standard mkdir() function. This code can be found on the net.

mmkdir($cache_dir);
$cache_filename = $cachedir . "/" . $cacheobject . ".gz";

if the cache file exists we check to see if it is stale by looking at the last modified time. If it is younger than 86400 seconds (24 hours) we return the file to the clients browser. We first check to see if the browser can accept gzip compressed files directly otherwise we uncompress the cache object before returning it to the client.

cache

if(file_exists($cache_filename)) {
  if ((filemtime($cache_filename) + 86400) > time()) {
    if ( strstr($HTTP_SERVER_VARS['HTTP_ACCEPT_ENCODING'], 'gzip')) {
      header('Content-Encoding: gzip');
      readfile($cache_filename);
    } else {
      $zp = gzopen($cache_filename, r);
      gzpassthru($zp);
      gzclose($zp);
    }

    exit();
  }
}

otherwise we enable buffering with compression, this PHP function automatically determines if the client can handle compression. The only disadvantage with this function is that page output is delayed until the whole page is constructed rather than being output piecemeal

ob_start("ob_gzhandler");

The rest of our page generation code goes in here, we then get the page contents from the buffer, compress this data and write it to the cache file for later before flushing the output to the client's browser

$buffer = ob_get_contents(); // assign the content of the buffer to $buffer
$zp = gzopen($cache_filename, "w");
gzwrite($zp, $buffer);
gzclose($zp);
ob_end_flush();

It really is that simple. Care may need to be taken with pages that shouldn't be cached