Re: [PATCH 1/2] sched/deadline: add per rq tracking of admitted bandwidth

From: luca abeni
Date: Thu Feb 11 2016 - 07:40:35 EST


On Thu, 11 Feb 2016 12:27:54 +0000
Juri Lelli <juri.lelli@xxxxxxx> wrote:

> On 11/02/16 13:22, Luca Abeni wrote:
> > Hi Juri,
> >
> > On Thu, 11 Feb 2016 12:12:57 +0000
> > Juri Lelli <juri.lelli@xxxxxxx> wrote:
> > [...]
> > > I think we still have (at least) two problems:
> > >
> > > - select_task_rq_dl, if we select a different target
> > > - select_task_rq might make use of select_fallback_rq, if
> > > cpus_allowed changed after the task went to sleep
> > >
> > > Second case is what creates the problem here, as we don't update
> > > task_rq(p) and fallback_cpu ac_bw. I was thinking we might do so,
> > > maybe adding fallback_cpu in task_struct, from
> > > migrate_task_rq_dl() (it has to be added yes), but I fear that we
> > > should hold both rq locks :/.
> > >
> > > Luca, did you already face this problem (if I got it right) and
> > > thought of a way to fix it? I'll go back and stare a bit more at
> > > those paths.
> > In my patch I took care of the first case (modifying
> > select_task_rq_dl() to move the utilization from the "old rq" to the
> > "new rq"), but I never managed to trigger select_fallback_rq() in my
> > tests, so I overlooked that case.
> >
>
> Right, I was thinking to do the same. And you did that after grabbing
> both locks, right?

Not sure if I did everything correctly, but my code in
select_task_rq_dl() currently looks like this (you can obviously
ignore the "migrate_active" and "*_running_bw()" parts, and focus on
the "*_rq_bw()" stuff):
[...]
if (rq != cpu_rq(cpu)) {
int migrate_active;

raw_spin_lock(&rq->lock);
migrate_active = hrtimer_active(&p->dl.inactive_timer);
if (migrate_active) {
hrtimer_try_to_cancel(&p->dl.inactive_timer);
sub_running_bw(&p->dl, &rq->dl);
}
sub_rq_bw(&p->dl, &rq->dl);
raw_spin_unlock(&rq->lock);
rq = cpu_rq(cpu);
raw_spin_lock(&rq->lock);
add_rq_bw(&p->dl, &rq->dl);
if (migrate_active)
add_running_bw(&p->dl, &rq->dl);
raw_spin_unlock(&rq->lock);
}
[...]

lockdep is not screaming, and I am not able to trigger any race
condition or strange behaviour (I am currently at more than 24h of
continuous stress-testing, but maybe my testcase is not so good in
finding races here :)



Luca