Re: [bug report] dead loop in generic_perform_write() //Re: [PATCH v7 07/12] iov_iter: Convert iterate*() to inline funcs

From: Tong Tiangen
Date: Mon Mar 04 2024 - 03:52:59 EST




在 2024/3/3 2:06, Linus Torvalds 写道:
On Sat, 2 Mar 2024 at 01:37, Tong Tiangen <tongtiangen@xxxxxxxxxx> wrote:

I think this solution has two impacts:
1. Although it is not a performance-critical path, the CPU usage may be
affected by one more memory copy in some large-memory applications.

Compared to the IO, the extra memory copy is a non-issue.

If anything, getting rid of the "copy_mc" flag removes extra code in a
much more important path (ie the normal iov_iter code).

Indeed. I'll test this solution. Theoretically, it should solve the problem.


2. If a hardware memory error occurs in "good location" and the
".copy_mc" is removed, the kernel will panic.

That's always true. We do not support non-recoverable machine checks
on kernel memory. Never have, and realistically probably never will. >
In fact, as far as I know, the hardware that caused all this code in
the first place no longer exists, and never really made it to wide
production.

Yes. There is a low probability that the newly applied memory is faulty.

Thanks,
Tong.


The machine checks in question happened on pmem, now killed by Intel.
It's possible that somebody wants to use it for something else, but
let's hope any future implementations are less broken than the
unbelievable sh*tshow that caused all this code in the first place.

The whole copy_mc_to_kernel() mess exists mainly due to broken pmem
devices along with old and broken CPU's that did not deal correctly
with machine checks inside the regular memory copy ('rep movs') code,
and caused hung machines.

IOW, notice how 'copy_mc_to_kernel()' just becomes a regular
'memcpy()' on fixed hardware, and how we have that disgusting
copy_mc_fragile_key that gets enabled for older CPU cores.

And yes, we then have copy_mc_enhanced_fast_string() which isn't
*that* disgusting, and that actually handles machine checks properly
on more modern hardware, but it's still very much "the hardware is
misdesiged, it has no testing, and nobody sane should depend on this"

In other words, it's the usual "Enterprise Hardware" situation. Looks
fancy on paper, costs an arm and a leg, and the reality is just sad,
sad, sad.

Linus
.