The SLUB
Allocator
An animated field guide to how the Linux kernel parcels out small objects — without thrashing, without locking, and without claiming more memory than it needs.
A million tiny objects, every second.
A running kernel is a blizzard of small allocations — file descriptors, network buffers, dentries, task structures. Asking the page allocator (which deals in 4 KB units) for each one would shred memory, cache lines, and CPU cycles. The slab allocator's job is to make those allocations cheap and dense.
open().A cache, per CPU, per node.
Each kind of object lives in its own kmem_cache. Inside the cache, every CPU gets its own private kmem_cache_cpu with an active slab and a small partial-slab cache that no other CPU may touch.
One step out, every NUMA node gets a kmem_cache_node that holds a longer list of partial slabs — shared, but rarely contended.
A slab itself is one or more contiguous pages, sliced into N equally-sized objects.
Free objects are the metadata.
The first machine word of every free object holds a pointer to the next free object in the same slab. The chain ends in NULL. The cache stores only the head.
Allocate? Pop the head. Free? Push the head. No bitmaps. No external bookkeeping. The freelist threads through memory the CPU was about to touch anyway, so the next-pointer is hot in cache by the time we need it.
(Modern kernels XOR-hash the pointer with a per-cache cookie — CONFIG_SLAB_FREELIST_HARDENED — to break heap-overflow exploits. Same data structure, defanged.)
kmalloc() in four loads.
The overwhelming majority of allocations resolve here. There is no spinlock, no atomic — just a CPU-local pointer dance protected by this_cpu_cmpxchg_double.
void *slab_alloc(kmem_cache *s) {
void *obj = c->freelist;
if (likely(obj)) {
c->freelist = get_freepointer(s, obj);
return obj; // done.
}
return __slab_alloc(s); // slow path
}
Demote, promote, repeat.
If the active slab's freelist is NULL, the slow path takes over. The active slab is simply dropped — SLUB doesn't track full slabs, which is half the reason it's leaner than SLAB.
A slab is then promoted from the per-CPU partial list. Promotion is still lockless: a cmpxchg pulls the head off the partial chain.
If the per-CPU partial list is empty, we drop into the per-node partial list (with one spinlock — but contention here is rare). Only if every list is empty do we ask the page allocator for fresh memory.
- Active slab's freelist hits
NULL - Active slab is detached (no longer tracked)
- Head of CPU partial list is promoted to active
- Allocation proceeds against the new active slab
A blank page, carved.
When neither the per-CPU nor the per-node partial lists can spare a slab, the cache asks the underlying page allocator (the buddy allocator) for fresh contiguous pages.
The page is sliced into size/align-sized objects. The first word of each future object is wired into a freelist chain — obj[0] → obj[1] → obj[2] → … → NULL — and the result is installed as the new active slab.
The cost is real but rare; once the slab exists, all subsequent allocations from it are back on the lockless fast path.
Push the head, mind the page.
Given a pointer, the kernel can find the slab it belongs to in constant time — the page descriptor for any address points back to its kmem_cache.
If the object lives on the current per-CPU active slab, the free is the mirror of the alloc: write the current head into *obj, then store obj as the new head. Lockless.
If the object belongs to some other slab, the per-slab freelist gets the push — and the slab may transition from full to partial (joining the partial list) or even partial to empty (potentially returning to the page allocator).
Try it yourself.
A full cache with one active slab and a partial list. kmalloc and kfree at will. Watch full slabs disappear from tracking and empty slabs return to the page allocator when the partial list overflows.
Where to go from here.
The source
mm/slub.c in the Linux kernel is famously dense but readable. Start at __slab_alloc() and work outward. About 4,500 lines.
The original paper
Christoph Lameter, Slub allocator (2007). The design rationale, written by the author. Short, lucid.
Adjacent allocators
SLAB (deprecated 2024), SLOB (tiny systems, also deprecated), and the page-level buddy allocator below them all.
What we glossed
NUMA-aware promotion, CPU partial-list overflow rules, freelist-pointer hardening, and the kmalloc-N general-purpose caches.
Tracing it live
/sys/kernel/slab/<cache>/ exposes per-cache stats. slabtop, perf trace -e 'kmem:*', and bpftrace are your friends.
Why it matters
Every syscall on the system passes through this code. Saving four cycles here saves them everywhere, on every machine.