Re: [PATCH 1/1] mm/page_alloc: Leave IRQs enabled for per-cpu page allocations

From: Mel Gorman
Date: Tue Oct 11 2022 - 04:25:41 EST


On Mon, Oct 10, 2022 at 10:45:43PM +0200, Vlastimil Babka wrote:
> On 10/10/22 16:22, Mel Gorman wrote:
> > On Wed, Aug 24, 2022 at 10:58:26PM -0600, Yu Zhao wrote:
> > > On Wed, Aug 24, 2022 at 8:18 AM Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > The pcp_spin_lock_irqsave protecting the PCP lists is IRQ-safe as a task
> > > > allocating from the PCP must not re-enter the allocator from IRQ context.
> > > > In each instance where IRQ-reentrancy is possible, the lock is acquired using
> > > > pcp_spin_trylock_irqsave() even though IRQs are disabled and re-entrancy
> > > > is impossible.
> > > >
> > > > Demote the lock to pcp_spin_lock avoids an IRQ disable/enable in the common
> > > > case at the cost of some IRQ allocations taking a slower path. If the PCP
> > > > lists need to be refilled, the zone lock still needs to disable IRQs but
> > > > that will only happen on PCP refill and drain. If an IRQ is raised when
> > > > a PCP allocation is in progress, the trylock will fail and fallback to
> > > > using the buddy lists directly. Note that this may not be a universal win
> > > > if an interrupt-intensive workload also allocates heavily from interrupt
> > > > context and contends heavily on the zone->lock as a result.
> > >
> > > Hi,
> > >
> > > This patch caused the following warning. Please take a look.
> > >
> > > Thanks.
> > >
> > > WARNING: inconsistent lock state
> > > 6.0.0-dbg-DEV #1 Tainted: G S W O
> > > --------------------------------
> >
> > I finally found time to take a closer look at this and I cannot reproduce
> > it against 6.0. What workload triggered the warning, on what platform and
> > can you post the kernel config used please? It would also help if you
> > can remember what git commit the patch was tested upon.
> >
> > Thanks and sorry for the long delay.
>
> I didn't (try to) reproduce this, but FWIW the report looked legit to me, as
> after the patch, pcp_spin_trylock() has to be used for both allocation and
> freeing to be IRQ safe. free_unref_page() uses it, so it's fine. But as the
> stack trace in the report shows, free_unref_page_list() does pcp_spin_lock()
> and not _trylock, and that's IMHO the problem.
>

I completely agree, it was a surprise to me that IO completion would
happen in soft IRQ context even though blk_done_softirq indicates that
it is normal and I didn't manage to trigger that case myself. I wondered
if there was an easy way to force that which would have made testing of
this easier. I can live without the reproduction case and cc Yu Zhao after
6.1-rc1 comes out and I've fixed this.

--
Mel Gorman
SUSE Labs