Re: [Lsf-pc] [LSF/MM TOPIC] really large storage sectors - goingbeyond 4096 bytes

From: Mel Gorman
Date: Thu Jan 30 2014 - 05:51:08 EST


On Wed, Jan 29, 2014 at 09:52:46PM -0700, Matthew Wilcox wrote:
> On Fri, Jan 24, 2014 at 10:57:48AM +0000, Mel Gorman wrote:
> > So far on the table is
> >
> > 1. major filesystem overhawl
> > 2. major vm overhawl
> > 3. use compound pages as they are today and hope it does not go
> > completely to hell, reboot when it does
>
> Is the below paragraph an exposition of option 2, or is it an option 4,
> change the VM unit of allocation?

Changing the VM unit of allocation is a major VM overhawl

> Other than the names you're using,
> this is basically what I said to Kirill in an earlier thread; either
> scrap the difference between PAGE_SIZE and PAGE_CACHE_SIZE, or start
> making use of it.
>

No. The PAGE_CACHE_SIZE would depend on the underlying address space and
vary. The large block patchset would have to have done this but I did not
go back and review the patches due to lack of time. With that it starts
hitting into fragmentation problems that have to be addressed somehow and
cannot just be waved away.

> The fact that EVERYBODY in this thread has been using PAGE_SIZE when they
> should have been using PAGE_CACHE_SIZE makes me wonder if part of the
> problem is that the split in naming went the wrong way. ie use PTE_SIZE
> for 'the amount of memory pointed to by a pte_t' and use PAGE_SIZE for
> 'the amount of memory described by a struct page'.
>
> (we need to remove the current users of PTE_SIZE; sparc32 and powerpc32,
> but that's just a detail)
>
> And we need to fix all the places that are currently getting the
> distinction wrong. SMOP ... ;-) What would help is correct typing of
> variables, possibly with sparse support to help us out. Big Job.
>

That's taking the approach of the large block patchset (as I understand
it, not reviewed, not working on this etc) without dealing with potential
fragmentation problems. Of course they could be remapped virtually if
necessary but that will be very constrained on 32-bit, the final transfer
to hardware will require scatter/gather and there is a setup/teardown
cost with virtual mappings such as faulting (setup) and IPIs to flush TLBs
(teardown) that would add overhead.

--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/