Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

From: Matthew Wilcox
Date: Tue Dec 08 2020 - 17:33:34 EST


On Tue, Dec 08, 2020 at 02:23:10PM -0800, Dan Williams wrote:
> On Tue, Dec 8, 2020 at 1:51 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> >
> > On Tue, Dec 08, 2020 at 01:32:55PM -0800, Ira Weiny wrote:
> > > On Mon, Dec 07, 2020 at 03:49:55PM -0800, Dan Williams wrote:
> > > > On Mon, Dec 7, 2020 at 3:40 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > > > >
> > > > > On Mon, Dec 07, 2020 at 03:34:44PM -0800, Dan Williams wrote:
> > > > > > On Mon, Dec 7, 2020 at 3:27 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Mon, Dec 07, 2020 at 02:57:03PM -0800, ira.weiny@xxxxxxxxx wrote:
> > > > > > > > +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> > > > > > > > + struct page *src_page, size_t src_off,
> > > > > > > > + size_t len)
> > > > > > > > +{
> > > > > > > > + char *dst = kmap_local_page(dst_page);
> > > > > > > > + char *src = kmap_local_page(src_page);
> > > > > > >
> > > > > > > I appreciate you've only moved these, but please add:
> > > > > > >
> > > > > > > BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);
> > > > > >
> > > > > > I imagine it's not outside the realm of possibility that some driver
> > > > > > on CONFIG_HIGHMEM=n is violating this assumption and getting away with
> > > > > > it because kmap_atomic() of contiguous pages "just works (TM)".
> > > > > > Shouldn't this WARN rather than BUG so that the user can report the
> > > > > > buggy driver and not have a dead system?
> > > > >
> > > > > As opposed to (on a HIGHMEM=y system) silently corrupting data that
> > > > > is on the next page of memory?
> > > >
> > > > Wouldn't it fault in HIGHMEM=y case? I guess not necessarily...
> > > >
> > > > > I suppose ideally ...
> > > > >
> > > > > if (WARN_ON(dst_off + len > PAGE_SIZE))
> > > > > len = PAGE_SIZE - dst_off;
> > > > > if (WARN_ON(src_off + len > PAGE_SIZE))
> > > > > len = PAGE_SIZE - src_off;
> > > > >
> > > > > and then we just truncate the data of the offending caller instead of
> > > > > corrupting innocent data that happens to be adjacent. Although that's
> > > > > not ideal either ... I dunno, what's the least bad poison to drink here?
> > > >
> > > > Right, if the driver was relying on "corruption" for correct operation.
> > > >
> > > > If corruption actual were happening in practice wouldn't there have
> > > > been screams by now? Again, not necessarily...
> > > >
> > > > At least with just plain WARN the kernel will start screaming on the
> > > > user's behalf, and if it worked before it will keep working.
> > >
> > > So I decided to just sleep on this because I was recently told to not introduce
> > > new WARN_ON's[1]
> > >
> > > I don't think that truncating len is worth the effort. The conversions being
> > > done should all 'work' At least corrupting users data in the same way as it
> > > used to... ;-) I'm ok with adding the WARN_ON's and I have modified the patch
> > > to do so while I work through the 0-day issues. (not sure what is going on
> > > there.)
> > >
> > > However, are we ok with adding the WARN_ON's given what Greg KH told me? This
> > > is a bit more critical than the PKS API in that it could result in corrupt
> > > data.
> >
> > zero_user_segments contains:
> >
> > BUG_ON(end1 > page_size(page) || end2 > page_size(page));
> >
> > These should be consistent. I think we've demonstrated that there is
> > no good option here.
>
> True, but these helpers are being deployed to many new locations where
> they were not used before.

So what's your preferred poison?

1. Corrupt random data in whatever's been mapped into the next page (which
is what the helpers currently do)
2. Copy less data than requested
3. Crash
4. Something else