Re: [RFC PATCH 0/4] Gang scheduling in CFS

From: Nikunj A Dadhania
Date: Mon Dec 19 2011 - 06:44:06 EST


On Mon, 19 Dec 2011 12:23:26 +0100, Ingo Molnar <mingo@xxxxxxx> wrote:
>
> * Nikunj A. Dadhania <nikunj@xxxxxxxxxxxxxxxxxx> wrote:
>
> > The following patches implements gang scheduling. These
> > patches are *highly* experimental in nature and are not
> > proposed for inclusion at this time.
> >
> > Gang scheduling is an approach where we make an effort to
> > run related tasks (the gang) at the same time on a number
> > of CPUs.
>
> The thing is, the (non-)scalability consequences are awful, gang
> scheduling is a true scalability nightmare. Things like this in
> gang_sched():
>
> + for_each_domain(cpu_of(rq), sd) {
> + count = 0;
> + for_each_cpu(i, sched_domain_span(sd))
> + count++;
>
> makes me shudder.
>
One point to note here is this happens only once for electing the
gang_leader, which can be done on bootup as well. And later when
offlining-onlining the cpu.

> So could we please approach this from the benchmarked workload
> angle first? The highest improvement is in ebizzy:
>
<snip>
> > ebizzy 2vm (improved 15 times, i.e. 1520%)
> > +------------+--------------------+--------------------+----------+
> > | Ebizzy |
> > +------------+--------------------+--------------------+----------+
> > | Parameter | Basline | gang:V2 | % imprv |
> > +------------+--------------------+--------------------+----------+
> > | EbzyRecords| 1709.50 | 27701.00 | 1520 |
> > | EbzyUser| 20.48 | 376.64 | 1739 |
>
It is getting more usertime.

> > | EbzySys| 1384.65 | 1071.40 | 22 |
> > | EbzyReal| 300.00 | 300.00 | 0 |
> > | BwUsage| 2456114173416.00 | 2483447784640.00 | 1 |
> > | HostIdle| 34.00 | 35.00 | -2 |
> > | UsrTime| 6.00 | 14.00 | 133 |
>
Even the guest numbers says so, got using iostat in guest.

>
> What's behind this huge speedup? Does ebizzy use user-space
> spinlocks perhaps? Could we do something on the user-space side
> to get a similar speedup?
>
Some more oprofile data here for the above ebizzy-2VM run:

ebizzy: gang top callers(2 VMs)
2147208 total 0
357627 ____pagevec_lru_add 1064
297518 native_flush_tlb_others 1328
245478 get_page_from_freelist 174
219277 default_send_IPI_mask_logical 978
168287 __do_page_fault 159
156154 release_pages 336
73961 handle_pte_fault 20
68923 down_read_trylock 2153
60094 __alloc_pages_nodemask 29
ebizzy: nogang top callers(2 VMs)
2771869 total 0
2653732 native_flush_tlb_others 11847
16004 get_page_from_freelist 11
15977 ____pagevec_lru_add 47
13125 default_send_IPI_mask_logical 58
10739 __do_page_fault 10
9379 release_pages 20
5330 handle_pte_fault 1
4727 down_read_trylock 147
3770 __alloc_pages_nodemask 1

Regards,
Nikunj

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/