Re: [PATCH 4/4] fdtable: Implement new pagesize-based fdtable allocation scheme.

From: Eric Dumazet
Date: Mon Oct 02 2006 - 13:25:43 EST


On Monday 02 October 2006 19:00, Vadim Lobanov wrote:
> On Monday 02 October 2006 01:52, Eric Dumazet wrote:
>
> > Current scheme is to allocate power of two sizes, and not 'the smallest
> > that accommodates the requested fd count'. This is for a good reason,
> > because we don't want to call vmalloc()/vfree() each time a process opens
> > 512 or 1024 more files (x86_64 or ia32)
>
> Yep, that is most definitely a consideration. I was balancing it against
> the fact that, when the table becomes big, growing it by a power of two
> regardless of the size results in massive memory usage deltas. The worry
> here is that an application may likely cause the table to grow by a huge
> amount, due to the power-of-two increase, and then actually use only a
> modest number of further fds, wasting the rest of the allocated table
> memory.
>
> Which applications open so many file handles so quickly? Do they actually
> need the amortized power-of-two table area increase? In those cases, would
> the actual process of opening these files take more time than growing the
> table in fixed-size steps? Or at least outweigh it enough that it would be
> more preferable to try to reduce memory waste instead of improve file open
> time?

I think that for such applications, the 'waste' of ram for fd table is nothing
compared to the ram cost of opened files/sockets/dentries/inodes.

>
> Is it really true that it will create less fragmentation? It seems to me
> that this will only be the true if most of the other heavy users of vmalloc
> also tried to use power-of-two allocation sizes.

I am quite sure that on my machines, big vmalloc users are fdtable most of the
time.

>
> What do you think of Andi Kleen's follow-up suggestion about eliminating
> vmalloc use altogether?

It would be interesting, but would need one indirection level, that could kill
performance of said huge applications... For such applications, the cost of
expanding fdtable is nothing compared to the cost of
open()/close()/poll()/read()/write() calls. This is because fdtable never
shrinks. So adding one indirection (if you use a table of pointers to PAGES
containing 1024 or 512 (struct file *)).

At least, expanding fdtable by 1024 slots would need to reallocate the first
table (adding one void *), and allocating one PAGE only. No need to copy
previous pages, this is a win.

Of course, big fdset still need vmalloc(), or else we cannot use
find_next_zero_bit() anymore... And for such applications, the time and
memory scanned to find a zero bit at open() time is probably the killer
(touching a large part of cpu caches). One million bits is 128 KB...

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/