Re: [PATCHv3 4/5] mm: make compound_head() robust

From: Paul E. McKenney
Date: Wed Aug 26 2015 - 12:39:00 EST


On Wed, Aug 26, 2015 at 06:04:12PM +0300, Kirill A. Shutemov wrote:
> On Tue, Aug 25, 2015 at 02:19:54PM -0700, Paul E. McKenney wrote:
> > On Tue, Aug 25, 2015 at 10:46:44PM +0200, Vlastimil Babka wrote:
> > > On 25.8.2015 22:11, Paul E. McKenney wrote:
> > > > On Tue, Aug 25, 2015 at 09:33:54PM +0300, Kirill A. Shutemov wrote:
> > > >> On Tue, Aug 25, 2015 at 01:44:13PM +0200, Vlastimil Babka wrote:
> > > >>> On 08/21/2015 02:10 PM, Kirill A. Shutemov wrote:
> > > >>>> On Thu, Aug 20, 2015 at 04:36:43PM -0700, Andrew Morton wrote:
> > > >>>>> On Wed, 19 Aug 2015 12:21:45 +0300 "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> wrote:
> > > >>>>>
> > > >>>>>> The patch introduces page->compound_head into third double word block in
> > > >>>>>> front of compound_dtor and compound_order. That means it shares storage
> > > >>>>>> space with:
> > > >>>>>>
> > > >>>>>> - page->lru.next;
> > > >>>>>> - page->next;
> > > >>>>>> - page->rcu_head.next;
> > > >>>>>> - page->pmd_huge_pte;
> > > >>>>>>
> > > >>>
> > > >>> We should probably ask Paul about the chances that rcu_head.next would like
> > > >>> to use the bit too one day?
> > > >>
> > > >> +Paul.
> > > >
> > > > The call_rcu() function does stomp that bit, but if you stop using that
> > > > bit before you invoke call_rcu(), no problem.
> > >
> > > You mean that it sets the bit 0 of rcu_head.next during its processing?
> >
> > Not at the moment, though RCU will splat if given a misaligned rcu_head
> > structure because of the possibility to use that bit to flag callbacks
> > that do nothing but free memory. If RCU needs to do that (e.g., to
> > promote energy efficiency), then that bit might well be set during
> > RCU grace-period processing.
>
> Ugh.. :-/
>
> > > bad news then. It's not that we would trigger that bit when the rcu_head part of
> > > the union is "active". It's that pfn scanners could inspect such page at
> > > arbitrary time, see the bit 0 set (due to RCU processing) and think that it's a
> > > tail page of a compound page, and interpret the rest of the pointer as a pointer
> > > to the head page (to test it for flags etc).
> >
> > On the other hand, if you avoid scanning rcu_head structures for pages
> > that are currently waiting for a grace period, no problem. RCU does
> > not use the rcu_head structure at all except for during the time between
> > when call_rcu() is invoked on that rcu_head structure and the time that
> > the callback is invoked.
> >
> > Is there some other page state that indicates that the page is waiting
> > for a grace period? If so, you could simply avoid testing that bit in
> > that case.
>
> No, I don't think so.

OK, I'll bite... How do you know that it is safe to invoke call_rcu(),
given that you are not allowed to invoke call_rcu() until the previous
callback has been invoked?

> For compound pages most of info of its state is stored in head page (e.g.
> page_count(), flags, etc). So if we examine random page (pfn scanner case)
> the very first thing we want to know if we stepped on tail page.
> PageTail() is what I wanted to encode in the bit...

Ah, so that would require the page scanner to do reverse mapping or some
such, then. Which is perhaps what you are trying to avoid.

> What if we change order of fields within rcu_head and put ->func first?
> Can we expect this pointer to have bit 0 always clear?

I asked that question some time back, and the answer was "no". You
can apparently have functions that start at odd addresses on some
architectures.

That said, there are likely to be reserved bits somewhere in the function
address, perhaps varying depending on architecture and/or boot, in the
case of address-space randomization. Perhaps some way of identifying
those bits with architecture-independent ways of querying and setting
them?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/