Re: [PATCH -v2.1] x86/msr: Filter MSR writes

From: Chris Down
Date: Tue Jul 14 2020 - 13:02:38 EST


Luck, Tony writes:
On Tue, Jul 14, 2020 at 05:04:48PM +0100, Chris Down wrote:
Borislav Petkov writes:
> On Tue, Jul 14, 2020 at 01:19:55PM +0100, Chris Down wrote:
> > That is, even with pr_err_ratelimited, we still end up logging on basically
> > every single write, even though it's from the same TGID writing to the same
> > MSRs, and end up becoming >80% of kmsg.
> >
> > Of course, one can boot with `allow_writes=1` to avoid these messages at
>
> Yes, use that.
>
> From a quick scan over that "tool" you pointed me at, it pokes at some
> MSRs from userspace which the kernel *also* writes to and this is
> exactly what should not be allowed.

I don't think we're in disagreement about that. My concern is strictly about
the amount of spam caused for some of those existing use cases during the
transition phase. People should know that their tools would break, but there
shouldn't be so many messages generated that it inevitably pushes other
useful information out of the kmsg buffer.

Maybe we just need smarter filtering of warnings. It doesn't
seem at all useful to warn for the same MSR 1000's of times.
Maybe keep a count of warnings for each MSR and just stop
all reports when reach a threshold?

That also a fine good solution, albeit more complex than just using the existing custom ratelimit_state infrastructure. Doing so probably also means we'd miss out on some of the other stuff that comes for free with it.

My only other concern with ratelimiting per-TGID or per-MSR was that the ratelimit cache table could become unwieldy, but if we keep it simple by limiting the size and not printing after we reach that, that sounds fine too.

Any solution which means that we avoid saturating kmsg for workloads which currently twiddle MSRs sounds fine to me. People should know that we don't support or encourage this, but it shouldn't be at the cost of potentially pushing everything else out of the kmsg buffer.