Re: [PATCH v2] xfrm: policy: Fix doulbe free in xfrm_policy_timer

From: Timo Teras
Date: Mon Mar 23 2020 - 03:56:55 EST


On Mon, 23 Mar 2020 15:21:45 +0800
Yuehaibing <yuehaibing@xxxxxxxxxx> wrote:

> On 2020/3/23 14:53, Timo Teras wrote:
> > Hi
> >
> > On Mon, 23 Mar 2020 09:41:55 +0800
> > YueHaibing <yuehaibing@xxxxxxxxxx> wrote:
> >
> >> After xfrm_add_policy add a policy, its ref is 2, then
> >>
> >> xfrm_policy_timer
> >> read_lock
> >> xp->walk.dead is 0
> >> ....
> >> mod_timer()
> >> xfrm_policy_kill
> >> policy->walk.dead = 1
> >> ....
> >> del_timer(&policy->timer)
> >> xfrm_pol_put //ref is 1
> >> xfrm_pol_put //ref is 0
> >> xfrm_policy_destroy
> >> call_rcu
> >> xfrm_pol_hold //ref is 1
> >> read_unlock
> >> xfrm_pol_put //ref is 0
> >> xfrm_policy_destroy
> >> call_rcu
> >>
> >> xfrm_policy_destroy is called twice, which may leads to
> >> double free.
> >
> > I believe the timer changes were added later in commit e7d8f6cb2f
> > which added holding a reference when timer is running. I think it
> > fails to properly account for concurrently running timer in
> > xfrm_policy_kill().
>
> commit e7d8f6cb2f hold a reference when &pq->hold_timer is armed,
> in my case, it's policy->timer, and hold_timer is not armed.

Ah, misread. Should have waited until first cup of coffee of the
morning..

I must have not understood del_timer() return value fully back then.

I first thought a more robust fix would be to take an extra reference
in the beginning of the timer function (and instead of using mod_timer()
return to see if a new reference is needed, it could be used in the
prologue to "keep" the reference). This would guarantee always proper
reference count inside the timer function.

But I suppose because of the above xfrm_policy_kill() is the only place
supposed to delete the timer, and that's why it had the locking in the
first place. And the above "fix" might still end up having timer armed
after kill_policy called del_timer() which is wrong.

So perhaps it's more straightforward to just have the lock as it was
originally around policy->walk.dead only. Perhaps adding a comment that
it's synchronizing with the timer function.

Since xfrm_policy_timer() ends with policy unref already now, the above
reference keeping tricking might be good to do even for the current
code as separate patch to avoid atomic ops if possible.

Thanks,
Timo