Re: [PATCH 1/2] ras: fix an off-by-one error in __find_elem()

From: Luck, Tony
Date: Tue Apr 16 2019 - 18:18:54 EST


On Tue, Apr 16, 2019 at 11:07:26AM +0200, Borislav Petkov wrote:
> On Mon, Apr 15, 2019 at 06:20:00PM -0700, Cong Wang wrote:
> > ce_arr.array[] is always within the range [0, ce_arr.n-1].
> > However, the binary search code in __find_elem() uses ce_arr.n
> > as the maximum index, which could lead to an off-by-one
> > out-of-bound access when the element after the last is exactly
> > the one just got deleted, that is, 'min' returned to caller as
> > 'ce_arr.n'.
>
> Sorry, I don't follow.
>
> There's a debugfs interface in /sys/kernel/debug/ras/cec/ with which you
> can input random PFNs and test the thing.
>
> Show me pls how this can happen with an example.

The array of previously seen pfn values is one page.

The problem case occurs when we've seen enough distinct
errors that we have filled every entry, then we try to
look up a pfn that is larger that any seen before.

The loop:

while (min < max) {
...
}

will terminate with "min" set to MAX_ELEMS. Then we
execute:

this_pfn = PFN(ca->array[min]);

which references beyond the end of the space allocated
for ca->array.

Probably won't crash, but we will read a garbage value
from whatever memory is allocated next.

Chances are high that the test:

if (this_pfn == pfn)

won't find that the garbage value matches the pfn that
we were looking for ... so we will likley be lucky and
not do anything too dumb. But we shouldn't just cross
our fingers and hope.

Fix looks mostly OK, but we should probably move the

if (to)
*to = min;

inside the new

if (min < ca->n) {
...
}

clause.

-Tony