Re: [PATCH] mm: shmem: do not call PageHWPoison on a ERR-page

From: Linus Torvalds
Date: Sat Nov 13 2021 - 17:59:11 EST


On Sat, Nov 13, 2021 at 2:30 PM Yang Shi <shy828301@xxxxxxxxx> wrote:
>
> The above snippet is actually ok since if *pagep returned via
> shmem_getpage()'s parameter is not NULL, then ret is 0.

That's a random implementation detail, and is not ok to rely on.

It may or may not be true, and is not part of the rules of error handling.

If a function returns an error, you shouldn't be looking at the other
stuff it returned.

Here's a very recent example of the same kind of problem:

https://lore.kernel.org/lkml/163663333331.414.639840290224641315.tip-bot2@tip-bot2/

where people didn't actually look properly at the return value of the
function, and instead looked at the page pointers that the function
filled in.

See? EXACT same logic. And completely buggy.

> When shmem_getpage() returns error code, *pagep is NULL IIUC.

No.

When a function returns an error code, you check for the error code,
and don't rely on weather the function then filled in other data (or
left it alone, or whatever).

So the code should

(a) check and handle error returns properly

(b) be legible

That (b) basically means that if it's not entirely trivial (and none
of this was entirely trivial), then when you get an error, you just
deal with it right away. You return early, and undo anything you need
to undo.

You don't do "oh, let's keep that error, and then do something else
that maybe also generates an error".

That "don't handle the error directly" was why
shmem_read_mapping_page_gfp() was buggy and would cause an oops.

And while the shmem_write_begin() code migth not cause an oops, it had
the same fundamental bad pattern.

Error handling is where 99% of all problems occur. But that also means
that you should do the obvious thing wrt error handling, and not have
some crazy "if function X returned an error, it will have left the
return array untouched" which may or may not be true.

When a function returns an error code, you do error handling based on
that code. Not on some random other state.

Linus