The slab will be delay frozen when it's picked to actively use by the
CPU, it becomes full at the same time, in which case we still need to
rely on "frozen" bit to avoid manipulating its list. So the slab will
be frozen only when activate use and be unfrozen only when deactivate.
Interesting solution! I wonder if we could go a bit further and remove
acquire_slab() completely. Because AFAICS even after your changes,
acquire_slab() is still attempted including freezing the slab, which means
still doing an cmpxchg_double under the list_lock, and now also handling the
special case when it failed, but we at least filled percpu partial lists.
What if we only filled the partial list without freezing, and then froze the
first slab outside of the list_lock?
Or more precisely, instead of returning the acquired "object" we would
return the first slab removed from partial list. I think it would simplify
the code a bit, and further reduce list_lock holding times.
I'll also point out a few more details, but it's not a full detailed review
as the suggestion above, and another for 4/5, could mean a rather
significant change for v3.