Re: [PATCH 1/2] ras: fix an off-by-one error in __find_elem()

From: Cong Wang
Date: Tue Apr 16 2019 - 19:48:10 EST


On Tue, Apr 16, 2019 at 4:28 PM Luck, Tony <tony.luck@xxxxxxxxx> wrote:
>
> On Tue, Apr 16, 2019 at 04:18:57PM -0700, Cong Wang wrote:
> > > The problem case occurs when we've seen enough distinct
> > > errors that we have filled every entry, then we try to
> > > look up a pfn that is larger that any seen before.
> > >
> > > The loop:
> > >
> > > while (min < max) {
> > > ...
> > > }
> > >
> > > will terminate with "min" set to MAX_ELEMS. Then we
> > > execute:
> > >
> > > this_pfn = PFN(ca->array[min]);
> > >
> > > which references beyond the end of the space allocated
> > > for ca->array.
> >
> > Exactly.
>
> Hmmm. But can we ever really have this happen? The call
> sequence to get here looks like:
>
>
> mutex_lock(&ce_mutex);
>
> if (ca->n == MAX_ELEMS)
> WARN_ON(!del_lru_elem_unlocked(ca));
>
> ret = find_elem(ca, pfn, &to);
>
> I.e. if the array was all the way full, we delete one element
> before calling find_elem(). So when we get here:
>
> static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
> {
> u64 this_pfn;
> int min = 0, max = ca->n;
>
> The biggest value "max" can have is MAX_ELEMS-1

This is exactly the explanation for why the crash is inside
memmove() rather than inside find_elem(). del_elem() actually
accesses off-by-two once we pass its 'if' check in line 232:

229 static void del_elem(struct ce_array *ca, int idx)
230 {
231 /* Save us a function call when deleting the last element. */
232 if (ca->n - (idx + 1))
233 memmove((void *)&ca->array[idx],
234 (void *)&ca->array[idx + 1],
235 (ca->n - (idx + 1)) * sizeof(u64));
236
237 ca->n--;
238 }

idx is ca->n and ca->n is MAX_ELEMS-1, then the above if statement
becomes true, therefore idx+1 is MAX_ELEMS which is just beyond
the valid range.

Thanks.