Re: very large directories

From: Jesse Pollard (pollard@tomcat.admin.navo.hpc.mil)
Date: Tue Feb 29 2000 - 10:37:11 EST


nbecker@fred.net:
>A coworker has an application that produces 10's of thousands of small
>files. It seems the performance of even a simple operation, such as
>'rm -rf' or find | xargs rm is very very slow.
>
>What are the mechanisms that limit performance? Is it the type of
>data structure used to represent directories?

yes - the directory is a linear list, searched sequentially.
You have a tremendos overhead in searching/opening a file. See if you
can get the application changed. The files created this way usually
have dates as part of the filename - something like: 199901231155 for
1999, January 23, at 11:55. Or some catagorizing pattern. The date is
something that we have had here. In this case, data generated every hour
gives 24 files/day, 168 files/week, 8736 files per year. After a historical
archive is created, it takes a long time to open a file. (average 4300
file name compares to locate a file). If this is split into buckets:
year/month/day.hour - you get (6 + 15 + 12) + however many years/2 lookups.
Much less than 4300. rm goes faster too, but for a different reason. Each
deleted file usually requires the rest of the directory to be copied up
one entry. I will admit that I don't know if Linux compacts the directory
on each delete. If it doesn't then delete time is limited only by the
time to delete each file. If it does, then it is limited by the number
of file entries being compacted (copy time is longer than delete time).

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: pollard@navo.hpc.mil

Any opinions expressed are solely my own.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Feb 29 2000 - 21:00:22 EST