On Tue, Nov 02, 2004 at 02:41:15PM -0800, Martin J. Bligh wrote:
eh? I don't see how that matters at all. After the DMA transfer, all the cache lines will have to be invalidated in every CPUs cache anyway, so
it's guaranteed to be stone-dead zero-degrees-kelvin cold. I don't see how
however hot it becomes afterwards is relevant?
if the cold page becomes hot, it means the hot pages in the hot
quicklist will become colder. The cache size is limited, so if something
becomes hot, something will become cold.
The only difference is that the hot pages will become cold during the
dma if we return an hot page, or the hot pages will become cold while
the cpu touches the data of the previously cold page, if we return a
cold page. Or are you worried that the cache snooping is measurable?
I believe the hot-cold thing, is mostly important for the hot
allocations not for the cold one. So that the hot allocations are served
in a strict LIFO order, that truly matters but the cold allocations are
a grey area.
What kind of slowdown can you measure if you drop __GFP_COLD enterely?
Don't get me wrong, __GFP_COLD makes perfect sense since it's so little
cost to do it that it most certainly worth the branch in the
allocator, but I don't think the hot pages worth a _reservation_ since
they'll become cold anwyays after the I/O has completed, so then we
could have returned an hot page in the first place without slowing down
in the buddy to get it.
If the DMA is to pages that are hot in the CPUs cache - it's WORSE ... we
have more work to do in terms of cacheline invalidates. Mmm ... in terms
of DMAs, we're talking about disk reads (ie a new page allocates) - we're
both on the same page there, right?
the DMA snoops the cache for the cacheline invalidate but I didn't think
it's measurable.
I would really like to see the performance difference of disabling the
__GFP_COLD thing for the allocations and to force picking from the head
of the list (and to always free the cold pages a the tail), I doubt you
will measure anything.