Re: [PATCH] sched: tg_set_cfs_bandwidth() causes rq->lock deadlock

From: Peter Zijlstra
Date: Mon May 19 2014 - 06:32:59 EST


On Fri, May 16, 2014 at 12:38:21PM +0400, Roman Gushchin wrote:

> I still think, there is a deadlock. I'll try to explain.
> Three CPUs must be involved:
> CPU0 CPU1 CPU2
> take rq->lock period timer fired
> ... take cfs_b lock
> ... ... tg_set_cfs_bandwidth()
> throttle_cfs_rq() release cfs_b lock take cfs_b lock
> ... distribute_cfs_runtime() timer_active = 0
> take cfs_b->lock wait for rq->lock ...
> __start_cfs_bandwidth()
> {wait for timer callback
> break if timer_active == 1}
>
> So, CPU0 and CPU1 are deadlocked.

OK, so can someone explain this ->timer_active thing? esp. what's the
'obvious' difference with hrtimer_active()?


Ideally we'd change the lot to not have this, but if we have to keep it
we'll need to make it lockdep visible because all this stinks

Attachment: pgpALrwbPj5iD.pgp
Description: PGP signature