Re: Possible incorrect handling of fault injection inside KMSAN instrumentation

From: Alexander Potapenko
Date: Wed Apr 12 2023 - 10:40:24 EST


On Sat, Apr 8, 2023 at 5:51 PM Dipanjan Das <mail.dipanjan.das@xxxxxxxxx> wrote:
>
> Hi,

Hi Dipanjan, thanks a lot for the elaborate analysis!


> kmsan's allocation of shadow or origin memory in
> kmsan_vmap_pages_range_noflush() fails silently due to fault injection
> (FI). KMSAN sort of “swallows” the allocation failure, and moves on.
> When either of them is later accessed while updating the metadata,
> there are no checks to test the validity of the respective pointers,
> which results in a page fault.

You are absolutely right.

> Our conclusions/Questions:
>
> - Should KMSAN fail silently? Probably not. Otherwise, the
> instrumentation always needs to check whether shadow/origin memory
> exists.

KMSAN shouldn't fail silently in any case.
kmsan_vmap_pages_range_noflush() used to have KMSAN_WARN_ON() to catch
such cases, but unfortunately I've failed to check the return values
of the kcalloc() calls.

> - Should KMSAN even be tested using fault injection? We are not sure.

At least our deployment of KMSAN on syzbot uses fault injection, so
having the two play well together is important.

> On one hand, the primary purpose of FI should be testing the
> application code. But also, inducing faults inside instrumentation
> clearly helps to find mistakes in that, too.

At first I had an idea of having a special GFP flag that prohibits
fault injections for the tool's allocations.
But this would just shift the allocations failures right, making them
harder to detect, because they will occur less often.
We'd better handle the failures properly instead.

> - What is a fix for this? Should a failure in the KMSAN
> instrumentation be propagated up so that the kernel allocator
> (vzalloc() in this case) can “pretend” to fail, too?

Yes, I think so.
Here are two patches that fix the problem:
- https://github.com/google/kmsan/commit/b793a6d5a1c1258326b0f53d6e3ac8aa3eeb3499
- for kmsan_vmap_pages_range_noflush();
- https://github.com/google/kmsan/commit/cb9e33e0cd7ff735bc302ff69c02274f24060cff
- for kmsan_ioremap_page_range()

Can you please try them out?

Alex