Very long URL aliases not correctly cached in memcache

For high traffic Drupal sites, using the memcache module is a common method for making sites scale. The memcache module uses the memcached object caching daemon for doing the actual caching.

In this article, we document how we ran into a limitation of memcached, and how we overcame it by a simple patch.

We have a client using the very useful URL alias feature for their news. Since this is a busy a site, they get pages and pages of comments for popular articles. The uses on the site started complaining that they can't page to the 2nd or subsequent pages of comments on such nodes. They would always be served the first page of the comments.

A bit unusual yet still normal node URL would be 109 or so characters, excluding the domain name, the protocol (http://), and the ?page=2 at the end. All told, this would be 136 characters or so.

The catch is that this client is using a language that uses a non-Latin 1-byte character set (Greek in this case, using unicode). Unicode uses 2 bytes per character for many non Latin based languages.

Because the URLs are encoded using the urlencode function, every Greek letter will be tripled (e.g. 5 Greek letters would become %CE%B5%CE%BB%CE). This means that our 136 character URL ends up being 340 characters!

What is the problem you may say?

Well the file memcached.h within memcached has a limit of 250 characters for the key:

/** Maximum length of a key. */
#define KEY_MAX_LENGTH 250

Contrary to some misleading comments on the Google Groups, the PECL memcache interface does not use hashing and still imposes the 250 byte limit in php_memcache.h.

/* stoled from memcached sources =) */ 
#define MMC_KEY_MAX_SIZE 250 

There is even this code in PECL that truncates the key to 250:

*result_len = key_len 

Now we have a problem! This is why all pages with ?page=2, ?page=6 all come back to the first page (or whichever page got cached first).

But we also have a solution.

We patched the file that is part of the memcache module to use an MD5 hash instead of the URL itself. This works because we never browse keys from Drupal in memcached. In fact memcached does not allow browsing by key. So, every time we set a key (a URL), we use its MD5 hash instead. Every time we ask for a URL, we use the md5 key instead, and everything works.

After patching said file, the users were able to page through the comments normally.

The patch is simply this:

---       5 Dec 2009 00:23:12 -0000
+++       29 Mar 2010 23:24:06 -0000
@@ -239,5 +239,5 @@ function dmemcache_key($key, $bin = 'cac
   $full_key = ($prefix ? $prefix. '-' : '') . $bin . '-' . $key;

-  return urlencode($full_key);
+  return md5(urlencode($full_key));

It is in the issue queue for the memcache module under #756926.