Improving performance with return values caching

Many functions (and methods) in a project will often provide the same return value for the same arguments, like:

  • mathematical functions:
function someMaths($x)
{
	return $x + pow($x, 3.2) - cos($x);
}
  • functions which retrieve content from a file:

function getConfiguration()
{
	return parse_ini_file('configuration.ini');
}
  • functions which retrieve content from a database:
function getArticleById($id)
{
	$sqlId  = mysql_real_escape_string($id);
	$result = mysql_query("SELECT `id`, `title` FROM `article` WHERE `id` = '$sqlId' LIMIT 1");

	if (false === $result) {
		throw new Exception('Query failed.');
	}

	return mysql_fetch_assoc($result);
}

If your project holds a function like one of these, and:

  • your prefered profiling tool reveals that a lot of the execution time is spent in this function
  • its returned value is always the same during a single run (script execution), when providing the same arguments
  • this function is called more than once per run

Then consider caching (saving) its return values.

Here is how it can be achieved (yes, there are more advanced techniques to do it, but this is not the point of this post): add a static variable in the body of the function, which will hold an associative array, mapping every parameter combination with a return value:

function getArticleById($id)
{
	static $cache = array();

	// Return value is not in cache yet?
	if (!isset($cache[$id])) {
		$sqlId  = mysql_real_escape_string($id);
		$result = mysql_query("SELECT `id`, `title` FROM `article` WHERE `id` = '$sqlId' LIMIT 1");

		if (false === $result) {
			throw new Exception('Query failed.');
		}

		// Add return value to cache
		$cache[$id] = mysql_fetch_assoc($result);
	}

	// Return cache content
	return $cache[$id];
}

Here it is, the (possibly heavy) process of querying the database will only be executed once at the first function call. Every other function call with the same argument will use the cached return value.

Keep in mind that the amount of cached return values must be reasonable (available memory is limited). If there are millions of possible arguments combinations for a function in a single run, you’ll have to consider a more elaborate way of optimizing it (this could be the subject of a future post).

Also, always, always, ALWAYS profile your code BEFORE you decide to apply an optimization like this one (and wait for your project to be nearly completed before profiling).

5 comments

  1. Yeah, it is a very neat trick that is used in many projects, such as Drupal. A real lifesaver at times.

    It has a gotcha, though. There is no way to reset the “cache” from the outside. You have to write it into each and every memoizing function explicitly.

    In your example, this could become important if, for instance, you suddenly want to use GetArticleById in a very long-running command-line script (e.g. a nightly cron job), where you might end up calling it thousands of times in the same run. Pretty soon, you would hit memory allocation limits, stale data and what have we.

    In such cases, you will either have to maintain two versions of the function or extend it with some cache manipulation parameters.

  2. Thistechnique will only work if the script in question is making many requests to the same function in the lifetime of a single server request. The function and variable scope only will last the lifetime of the current script. The next user/request gains no advantage of the caching in your model.

    Your better off using memcached, or some sort of shared memory storage if you want to gain caching peformance for typical web application design patterns

Leave a Reply

Your email address will not be published. Required fields are marked *