Re: [PATCH 0/10] bulk cpu removal support

From: Andrew Morton
Date: Thu May 11 2006 - 13:06:49 EST


Ashok Raj <ashok.raj@xxxxxxxxx> wrote:
>
> On Wed, May 10, 2006 at 11:06:06PM -0700, Andrew Morton wrote:
> Hi Andrew,
>
> > Shaohua Li <shaohua.li@xxxxxxxxx> wrote:
> > >
> > > CPU hotremove will migrate tasks and redirect interrupts off dead cpu.
> >
> > This seems an awful lot of code for something which happens so infrequently.
> >
> > How big is the problem you're fixing here, and what are the
> > user-observeable effects of these changes?
>
> This is useful when say a NUMA node is being removed. With new multi-core
> CPUs comming up, considering a 2 core with HT, we could have up to 4 logical
> per socket. On NUMA node with 4 sockets, a node removal will mean we
> do 16 single cpu offlines. Each time the process and interrupts could
> end up on a CPU that might be removed just immediatly.
>
> The same is also useful for SMP Suspend/resume cases since the logical offline
> is same here as well.
>
> Even thought the code changes seem a lot, most of it is just preparation of
> functions ready to accept a cpumask_t instead of a single cpu like earlier.
> The reason we split them to smaller chunks so the scope of change is well
> understood with each patch.
>
> The major changes are
>
> - stop machine to run cpu offline functions on each cpu going offline
> - prepare offline functions in offline path to take cpumask_t
> - Some task migrate dead lock removal consideration that we ran into
> during stress test.
>
> I know Shaohua ran tests for more than 20+ hrs with the patch, both on i386
> and x86_64.
>
> once we get some time deltas on a bigger machine it will help a lot.
> Iam also trying to check with some OEM';s who have such large machines for
> some data.. keep posted.
>

OK, thanks. I'm a little surprised that this patch wasn't accompanied by a
problem description, really. I mean, if a single CPU offlining takes three
milliseconds then why bother?

I assume it must take much longer, else you wouldn't have written the code.
Have you any ballpark numbers for how long it _does_ take?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/