Re: [REGRESSION] funny sched_domain build failure during resume

From: Peter Zijlstra
Date: Wed May 14 2014 - 13:11:01 EST


On Wed, May 14, 2014 at 01:02:38PM -0400, Tejun Heo wrote:
> On Wed, May 14, 2014 at 04:00:34PM +0200, Peter Zijlstra wrote:
> > Does something like the below help any? I noticed those things (cpudl
> > and cpupri) had [NR_CPUS] arrays, which is always 'fun'.
> >
> > The below is a mostly no thought involved conversion of cpudl which
> > boots, I'll also do cpupri and then actually stare at the algorithms to
> > see if I didn't make any obvious fails.
>
> Yeah, should avoid large allocation on reasonably sized machines and I
> don't think 2k CPU machines suspend regularly. Prolly good / safe
> enough for -stable port?

Yeah, its certainly -stable material. Esp. if this cures the immediate
problem.

> It'd be still nice to avoid allocations if
> possible during online tho given that the operation happens while mm
> is mostly crippled.

Yeah, I started looking at that but that turned out to be slightly more
difficult than I had hoped (got lost in the suspend code). Also avoiding
large order allocs is good practise regardless.

So probably the easiest way to not free/alloc the entire sched_domain
thing is just keeping it around in its entirety over suspend/resume, as
I think the promise of suspend/resume is that you return to the
status-quo.

But I'll stick it on the todo list after fixing this use-after-free
thing I've been trying to chase down.


Attachment: pgpuVdLIeevC8.pgp
Description: PGP signature