Re: Linux regressions report for mainline [2023-04-16]

From: Linus Torvalds
Date: Tue Apr 18 2023 - 15:12:15 EST


On Tue, Apr 18, 2023 at 11:20 AM David Sterba <dsterba@xxxxxxx> wrote:
>
> There's also in-memory cache of already trimmed ranges since last mount
> so even running discard repeatedly (either fstrim or as mount option)
> will not do extra IO. We try hard not to provoke the firmware bugs.

So we've had devices that claim to support discard, and then simply don't.

I have dim memories of people reporting IO simply stopping working
after a discard when it confused the GC logic too much.

And yes, those dim memories are from many years ago when SSD's were
new and fancy, and we had all kinds of crazy stuff going on, including
serious big SSD manufacturers that came to the kernel summit and said
that we need to do IO in 64kB aligned hunks, because doing GC was too
hard.

Those people have now thankfully gone off and done shingled drives
instead and we can mostly ignore them (although I do note that btrfs
seems to be gulping down the shingled drive koolaid too), but I'm
afraid that some of that incompetence still exists in the form of old
drives.

And some of it isn't even that old. See commit 07d2872bf4c8 ("mmc:
core: Add SD card quirk for broken discard") which is from late last
year. I'm not sure what the failure case there was (apart from the
"mk2fs failed", which I _assume_ was mkfs or mke2fs).

The real problem cases tend to be things like random USB memory sticks
etc. I think the Sandisk MMC case is not that different. A lot of odd
small embedded flash controllers that have never been tested except
under Windows or in cameras or whatever.

So discard tends to have two classes of problems

(a) performance problems due to being non-queued, or simply because
the flash controller is small and latency can be absolutely *huge*
when it processes trims

(b) the "it doesn't work at all" problem

and it's really that "it doesn't work" case I worry about.

We have quite a few trim-related quirks. Do this:

git grep HORKAGE.*TRIM

to see just the libata cases. Yes, some of those are "the queued
version doesn't work". Others are just "it's not zero after trim".
Whatever. But some of them are literally "do not use trim at all".

See commit cda57b1b05cf ("libata: force disable trim for SuperSSpeed
S238"), and tell me that the sentence

"This device loses blocks, often the partition table area, on trim"

doesn't worry you? Ok, so that's from 2015, so "old drives only".

Or how about c8ea23d5fa59 ("ata: libata-core: Disable TRIM on M88V29")
from last year:

"While it also advertises TRIM support, I/O errors are reported
when the discard mount option fstrim is used. TRIM also fails
when disabling NCQ and not just as an NCQ command"

Again, that's libata - odd crazy hardware. But it's exactly the odd
crazy hardware that worries me. When the failure mode isn't "it's
slow", but "it ATE MY WHOLE DISK", that's a scary scary problem.

Hmm?

I dunno. Maybe you have reason to believe that all of these cases have
been fixed, or that some of these were caused by kernel bugs because
we did things wrong, and those have been fixed.

But the failure modes just makes me worry. From your email, it *seems*
like you think that the failures were primarily performance-related.

Linus