My last article, Do Not Use Memcached As a Data Store, received a fair amount of criticism all going along the lines of “Duh. Of course it’s pretty stupid to use Memcached to store data, as it’s meant, as its name indicates, to be a CACHE stored in MEMORY (Memcached, duh).” Since it seems my message was not communicated properly, I’ve tried to clarify which use cases are of interest, and enumerate the challenges awaiting whoever wants to use Memcached to store data.
Using Memcached not for cache
Memcached was designed for caching data. Still, some people didn’t stop at the name and considered that Memcached being a distributed hash table, it could probably be used for other use cases than just caching. Some people have thus felt pretty creative in using it. A few examples of how it could be used:
- highly scalable shard index
- web session storage mechanism
- Some would like to use it to cache heavy computations and persist data (to disk)
Memcached is often considered in non-cache usages because:
- it’s blazingly fast
- it scales easily to tens of thousands of operations per server
- it’s very versatile since it’s just a key/value store
Storing semi-persistent data in Memcached
For persistent data, Memcached is clearly not an option, as risks to lose data are too high over a long period of time. But what about semi-persistent data? (semi-persistent data being data you can afford to lose in case of failure, but not in normal operation).
In other terms:
Does Memcached offer enough guarantees to store semi-persistent data with acceptable risks?
The answer is probably not, unless you go to great lengths to fix it. Here are a few things you can do to minimize data loss probability and consequences:
- reserve a specific memcache server for your non-cache usage: this gives you some more control and more guarantee that your cache is just not going to fill up (thus evicting keys).
- set an infinite expiry on keys (or the expiry you need)
- implement a backup solution should a memcache server goes down. Two possible options:
- refresh data periodically to disk(from the client): for example, every 10th write (to compare to another key-value store, Redis persists periodically data to disk server-side). This allows you not to lose everything in case a memcached server goes down. How often you need to do this depends on how much data you can afford to lose/how much performance hit you can afford to take.
- write data to redundant memcached servers (one being the master, the second one the failover).
Environment control and key eviction
Even if you controlled the environment following previous steps, you would have no guarantee that keys would not be evicted. They probably will. This is in line with the design of Memcached as a cache, as commenters to my previous post have pointed out. What I originally wanted to point out was that it is not the fact that the memory is stored in memory, or the fact that the cache could fill up, or the fact that keys expire that are problematic.
What comes back to bite us, in this case, is a detail of implementation, in the form of a slab allocator.
Memcached uses a slab allocator, to avoid memory fragmentation. It reserves chunks of memory for objects based on their size. It means that your keys are evicted even if the memory is not full. That, my friend, means that you have absolutely no control over the eviction of your keys. Let me repeat again: if you consider the design of Memcached as a cache, it makes sense – but it’s a surprise for all of us people who tried to use it as a general key-value store, as this detail is not obvious from reading Memcached documentation.
A much better way to store semi-persistent data