84 lines
4.8 KiB
Plaintext
Executable File
84 lines
4.8 KiB
Plaintext
Executable File
Date: Fri, 5 Sep 2003 20:31:03 +0300
|
|
From: Anatoly Vorobey <mellon@pobox.com>
|
|
To: memcached@lists.danga.com
|
|
Subject: Re: Memory Management...
|
|
|
|
On Fri, Sep 05, 2003 at 12:07:48PM -0400, Kyle R. Burton wrote:
|
|
> prefixing keys with a container identifier). We have just begun to
|
|
> look at the implementation of the memory management sub-system with
|
|
> regards to it's allocation, de-allocation and compaction approaches.
|
|
> Is there any documentation or discussion of how this subsystem
|
|
> operates? (slabs.c?)
|
|
|
|
There's no documentation yet, and it's worth mentioning that this
|
|
subsystem is the most active area of memcached under development at the
|
|
moment (however, all the changes to it won't modify the way memcached
|
|
presents itself towards clients, they're primarily directed at making
|
|
memcached use memory more efficiently).
|
|
|
|
Here's a quick recap of what it does now and what is being worked
|
|
on.
|
|
|
|
The primary goal of the slabs subsystem in memcached was to eliminate
|
|
memory fragmentation issues totally by using fixed-size memory chunks
|
|
coming from a few predetermined size classes (early versions of
|
|
memcached relied on malloc()'s handling of fragmentation which proved
|
|
woefully inadequate for our purposes). For instance, suppose
|
|
we decide at the outset that the list of possible sizes is: 64 bytes,
|
|
128 bytes, 256 bytes, etc. - doubling all the way up to 1Mb. For each
|
|
size class in this list (each possible size) we maintain a list of free
|
|
chunks of this size. Whenever a request comes for a particular size,
|
|
it is rounded up to the closest size class and a free chunk is taken
|
|
from that size class. In the above example, if you request from the
|
|
slabs subsystem 100 bytes of memory, you'll actually get a chunk 128
|
|
bytes worth, from the 128-bytes size class. If there are no free chunks
|
|
of the needed size at the moment, there are two ways to get one: 1) free
|
|
an existing chunk in the same size class, using LRU queues to free the
|
|
least needed objects; 2) get more memory from the system, which we
|
|
currently always do in _slabs_ of 1Mb each; we malloc() a slab, divide
|
|
it to chunks of the needed size, and use them.
|
|
|
|
The tradeoff is between memory fragmentation and memory utilisation. In
|
|
the scheme we're now using, we have zero fragmentation, but a relatively
|
|
high percentage of memory is wasted. The most efficient way to reduce
|
|
the waste is to use a list of size classes that closely matches (if
|
|
that's at all possible) common sizes of objects that the clients
|
|
of this particular installation of memcached are likely to store.
|
|
For example, if your installation is going to store hundreds of thousands of objects of the size exactly 120 bytes, you'd be much better
|
|
off changing, in the "naive" list of sizes outlined above, the class
|
|
of 128 bytes to something a bit higher (because the overhead of
|
|
storing an item, while not large, will push those 120-bytes objects over
|
|
128 bytes of storage internally, and will require using 256 bytes for
|
|
each of them in the naive scheme, forcing you to waste almost 50% of
|
|
memory). Such tinkering with the list of size classes is not currently
|
|
possible with memcached, but enabling it is one of the immediate goals.
|
|
|
|
Ideally, the slabs subsystem would analyze at runtime the common sizes
|
|
of objects that are being requested, and would be able to modify the
|
|
list of sizes dynamically to improve memory utilisation. This is not
|
|
planned for the immediate future, however. What is planned is the
|
|
ability to reassign slabs to different classes. Here's what this means.
|
|
Currently, the total amount of memory allocated for each size class is
|
|
determined by how clients interact with memcached during the initial
|
|
phase of its execution, when it keeps malloc()'ing more slabs and
|
|
dividing them into chunks, until it hits the specified memory limit
|
|
(say, 2Gb, or whatever else was specified). Once it hits the limit, to
|
|
allocate a new chunk it'll always delete an existing chunk of the same
|
|
size (using LRU queues), and will never malloc() or free() any memory
|
|
from/to the system. So if, for example, during those initial few hours
|
|
of memcached's execution your clients mainly wanted to store very small
|
|
items, the bulk of memory allocated will be divided to small-sized
|
|
chunks, and the large size classes will get fewer memory, therefore the
|
|
life-cycle of large objects you'll store in memcached will henceforth
|
|
always be much shorter, with this instance of memcached (their LRU
|
|
queues will be shorter and they'll be pushed out much more often). In
|
|
general, if your system starts producing a different pattern of common
|
|
object sizes, the memcached servers will become less efficient, unless
|
|
you restart them. Slabs reassignment, which is the next feature being
|
|
worked on, will ensure the server's ability to reclaim a slab (1Mb of
|
|
memory) from one size class and put it into another class size, where
|
|
it's needed more.
|
|
|
|
--
|
|
avva
|