Re: [PATCH 0/7] padata: parallelize deferred page init

From: Daniel Jordan
Date: Thu Apr 30 2020 - 22:45:23 EST

Next message: Daniel Jordan: "Re: [PATCH 5/7] mm: move zone iterator outside of deferred_init_maxorder()"
Previous message: Alan Stern: "Re: [PATCH 08/15] usb: ehci: avoid gcc-10 zero-length-bounds warning"
In reply to: Pavel Tatashin: "Re: [PATCH 0/7] padata: parallelize deferred page init"
Next in thread: Josh Triplett: "Re: [PATCH 0/7] padata: parallelize deferred page init"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Apr 30, 2020 at 05:40:59PM -0400, Pavel Tatashin wrote:
> On Thu, Apr 30, 2020 at 5:31 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> > On Thu, 30 Apr 2020 16:11:18 -0400 Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> wrote:
> >
> > > Sometimes the kernel doesn't take full advantage of system memory
> > > bandwidth, leading to a single CPU spending excessive time in
> > > initialization paths where the data scales with memory size.
> > >
> > > Multithreading naturally addresses this problem, and this series is the
> > > first step.
> > >
> > > It extends padata, a framework that handles many parallel singlethreaded
> > > jobs, to handle multithreaded jobs as well by adding support for
> > > splitting up the work evenly, specifying a minimum amount of work that's
> > > appropriate for one helper thread to do, load balancing between helpers,
> > > and coordinating them. More documentation in patches 4 and 7.
> > >
> > > The first user is deferred struct page init, a large bottleneck in
> > > kernel boot--actually the largest for us and likely others too. This
> > > path doesn't require concurrency limits, resource control, or priority
> > > adjustments like future users will (vfio, hugetlb fallocate, munmap)
> > > because it happens during boot when the system is otherwise idle and
> > > waiting on page init to finish.
> > >
> > > This has been tested on a variety of x86 systems and speeds up kernel
> > > boot by 6% to 49% by making deferred init 63% to 91% faster.
> >
> > How long is this up-to-91% in seconds? If it's 91% of a millisecond
> > then not impressed. If it's 91% of two weeks then better :)

The largest system I could test had 384G per node and saved 1.5 out of 4
seconds.

> > Relatedly, how important is boot time on these large machines anyway?
> > They presumably have lengthy uptimes so boot time is relatively
> > unimportant?
>
> Large machines indeed have a lengthy uptime, but they also can host a
> large number of VMs meaning that downtime of the host increases the
> downtime of VMs in cloud environments. Some VMs might be very sensible
> to downtime: game servers, traders, etc.
>
> > IOW, can you please explain more fully why this patchset is valuable to
> > our users?

I'll let the users speak for themselves, but I have a similar use case to Pavel
of limiting the downtime of VMs running on these large systems, and spinning up
instances as fast as possible is also desirable for our cloud users.

Next message: Daniel Jordan: "Re: [PATCH 5/7] mm: move zone iterator outside of deferred_init_maxorder()"
Previous message: Alan Stern: "Re: [PATCH 08/15] usb: ehci: avoid gcc-10 zero-length-bounds warning"
In reply to: Pavel Tatashin: "Re: [PATCH 0/7] padata: parallelize deferred page init"
Next in thread: Josh Triplett: "Re: [PATCH 0/7] padata: parallelize deferred page init"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]