117 lines
7.9 KiB
Plaintext
117 lines
7.9 KiB
Plaintext
|
<?page
|
||
|
wintitle=>a distributed memory object caching system
|
||
|
body<=
|
||
|
|
||
|
<?h1 What is <?memd?>? h1?>
|
||
|
<p><?memd?> is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.</p>
|
||
|
<p><a href="http://www.danga.com/">Danga Interactive</a> developed <?memd?> to enhance the speed of <a href="http://www.livejournal.com/">LiveJournal.com</a>, a site which was already doing 20 million+ dynamic page views per day for 1 million users with a bunch of webservers and a bunch of database servers. <?memd?> dropped the database load to almost nothing, yielding faster page load times for users, better resource utilization, and faster access to the databases on a memcache miss.</p>
|
||
|
|
||
|
<?h1 How it Works h1?>
|
||
|
<p>First, you start up the <?memd?> daemon on as many spare machines as you have. The daemon has no configuration file, just a few command line options, only 3 or 4 of which you'll likely use:
|
||
|
<pre class='example'># ./memcached -d -m 2048 -l 10.0.0.40 -p 11211</pre>
|
||
|
<p>This starts <?memd?> up as a daemon, using 2GB of memory, and listening on IP 10.0.0.40, port 11211. Because a 32-bit process can only address 4GB of virtual memory (usually significantly less, depending on your operating system), if you have a 32-bit server with 4-64GB of memory using PAE you can just run multiple processes on the machine, each using 2 or 3GB of memory.</p>
|
||
|
|
||
|
<?h1 Porting the Application h1?>
|
||
|
<p>Now, in your application, wherever you go to do a database query, first check the memcache. If the memcache returns an undefined object, then go to the database, get what you're looking for, and put it in the memcache:</p>
|
||
|
<pre class='example'>
|
||
|
<div class='exampletitle'>Perl Example (see <a href="apis.bml">APIs page</a>)</div>
|
||
|
sub get_foo_object {
|
||
|
my $foo_id = int(shift);
|
||
|
my $obj = $::MemCache->get("foo:$foo_id");
|
||
|
return $obj if $obj;
|
||
|
|
||
|
$obj = $::db->selectrow_hashref("SELECT .... FROM foo f, bar b ".
|
||
|
"WHERE ... AND f.fooid=$foo_id");
|
||
|
$::MemCache->set("foo:$foo_id", $obj);
|
||
|
return $obj;
|
||
|
}</pre>
|
||
|
|
||
|
<p>(If your internal API was already clean enough, you should only have to do this in a few spots. Start with the queries that kill your database the most, then move to doing as much as possible.)</p>
|
||
|
|
||
|
<p>You'll notice the data structure the server provides is just a dictionary. You assign values to keys, and you request values from keys.</p>
|
||
|
|
||
|
<p>Now, what actually happens is that the API hashes your key to a unique server. (You define all the available servers and their weightings when initializing the API) Alternatively, the APIs also let you provide your own hash value. A good hash value for user-related data is the user's ID number. Then, the API maps that hash value onto a server (modulus number of server buckets, one bucket for each server IP/port, but some can be weighted heigher if they have more memory available).</p>
|
||
|
|
||
|
<p>If a host goes down, the API re-maps that dead host's requests onto the servers that are available.</p>
|
||
|
|
||
|
<?h1 Shouldn't the database do this? h1?>
|
||
|
<p>Regardless of what database you use (MS-SQL, Oracle, Postgres, MysQL-InnoDB, etc..), there's a lot of overhead in implementing <a href="http://www.wikipedia.org/wiki/ACID">ACID</a> properties in a RDBMS, especially when disks are involved, which means queries are going to block. For databases that aren't ACID-compliant (like MySQL-MyISAM), that overhead doesn't exist, but reading threads block on the writing threads.</p>
|
||
|
<p><?memd?> never blocks. See the "Is memcached fast?" question below.</p>
|
||
|
|
||
|
<?h1 What about shared memory? h1?>
|
||
|
|
||
|
<p>The first thing people generally do is cache things within their
|
||
|
web processes. But this means your cache is duplicated multiple
|
||
|
times, once for each mod_perl/PHP/etc thread. This is a waste of
|
||
|
memory and you'll get low cache hit rates. If you're using a
|
||
|
multi-threaded language or a shared memory API (IPC::Shareable, etc),
|
||
|
you can have a global cache for all threads, but it's per-machine. It doesn't scale to multiple machines.
|
||
|
Once you have 20 webservers, those 20 independent caches start to look
|
||
|
just as silly as when you had 20 threads with their own caches on a
|
||
|
single box. (plus, shared memory is typically laden with limitations)</p>
|
||
|
|
||
|
<p>The <?memd?> server and clients work together to implement one
|
||
|
global cache across as many machines as you have. In fact, it's
|
||
|
recommended you run both web nodes (which are typically memory-lite
|
||
|
and CPU-hungry) and memcached processes (which are memory-hungry and
|
||
|
CPU-lite) on the same machines. This way you'll save network
|
||
|
ports.</p>
|
||
|
|
||
|
<?h1 What about MySQL 4.x query caching? h1?>
|
||
|
<p>MySQL query caching is less than ideal, for a number of reasons:</p>
|
||
|
<ul class='spaced'>
|
||
|
|
||
|
<li>MySQL's query cache destroys the entire cache for a given table whenever that table is changed. On a high-traffic site with updates happening many times per second, this makes the the cache practically worthless. In fact, it's often harmful to have it on, since there's a overhead to maintain the cache.</li>
|
||
|
|
||
|
<li>On 32-bit architectures, the entire server (including the query cache) is limited to a 4 GB virtual address space. <?memd?> lets you run as many processes as you want, so you have no limit on memory cache size.</li>
|
||
|
|
||
|
<li>MySQL has a query cache, not an object cache. If your objects require extra expensive construction after the data retrieval step, MySQL's query cache can't help you there.</li>
|
||
|
|
||
|
</ul>
|
||
|
<p>If the data you need to cache is small and you do infrequent updates, MySQL's query caching should work for you. If not, use <?memd?>.</p>
|
||
|
|
||
|
<?h1 What about database replication? h1?>
|
||
|
|
||
|
<p>You can spread your reads with replication, and that helps a lot,
|
||
|
but you can't spread writes (they have to process on all machines) and
|
||
|
they'll eventually consume all your resources. You'll find yourself
|
||
|
adding replicated slaves at an ever-increasing rate to make up for the
|
||
|
diminishing returns each addition slave provides.</p>
|
||
|
|
||
|
<p>The next logical step is to horizontally partition your dataset
|
||
|
onto different master/slave clusters so you can spread your writes,
|
||
|
and then teach your application to connect to the correct cluster
|
||
|
depending on the data it needs.</p>
|
||
|
|
||
|
<p>While this strategy works, and is recommended, more databases (each
|
||
|
with a bunch of disks) statistically leads to more frequent hardware
|
||
|
failures, which are annoying.</p>
|
||
|
|
||
|
<p>With <?memd?> you can reduce your database reads to a mere
|
||
|
fraction, leaving the databases to mainly do infrequent writes, and
|
||
|
end up getting much more bang for your buck, since your databases
|
||
|
won't be blocking themselves doing ACID bookkeeping or waiting on
|
||
|
writing threads.</p>
|
||
|
|
||
|
<?h1 Is <?memd?> fast? h1?> <p>Very fast. It uses <a
|
||
|
href="http://www.monkey.org/~provos/libevent/">libevent</a> to scale
|
||
|
to any number of open connections</a> (using <a
|
||
|
href="http://www.xmailserver.org/linux-patches/nio-improve.html">epoll</a>
|
||
|
on Linux, if available at runtime), uses non-blocking network I/O, refcounts internal objects
|
||
|
(so objects can be in multiple states to multiple clients), and uses
|
||
|
its own slab allocator and hash table so virtual memory never gets
|
||
|
externally fragmented and allocations are guaranteed O(1).</p>
|
||
|
|
||
|
<?h1 What about race conditions? h1?>
|
||
|
<p>You might wonder: <i>"What if the <tt>get_foo()</tt> function adds a stale version of the Foo object to the cache right as/after the user updates their Foo object via update_foo()?"</i></p>
|
||
|
<p>While the server and API only have one way to get data from the cache, there exists 3 ways to put data in:</p>
|
||
|
<ul class='spaced'>
|
||
|
<li><b>set</b> -- unconditionally sets a given key with a given value (<tt>update_foo()</tt> should use this)</li>
|
||
|
<li><b>add</b> -- adds to the cache, only if it doesn't already exist (<tt>get_foo()</tt> should use this)</li>
|
||
|
<li><b>replace</b> -- sets in the cache only if the key already exists (not as useful, only for completeness)</li>
|
||
|
</ul>
|
||
|
Additionally, all three support an expiration time.
|
||
|
|
||
|
<=body
|
||
|
page?>
|