Re: [PATCH 09/14] cxl/mem: Add a "RAW" send command

From: Dan Williams
Date: Mon Feb 01 2021 - 16:21:02 EST


On Mon, Feb 1, 2021 at 11:36 AM Konrad Rzeszutek Wilk
<konrad.wilk@xxxxxxxxxx> wrote:
>
> On Mon, Feb 01, 2021 at 11:27:08AM -0800, Ben Widawsky wrote:
> > On 21-02-01 13:24:00, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Jan 29, 2021 at 04:24:33PM -0800, Ben Widawsky wrote:
> > > > The CXL memory device send interface will have a number of supported
> > > > commands. The raw command is not such a command. Raw commands allow
> > > > userspace to send a specified opcode to the underlying hardware and
> > > > bypass all driver checks on the command. This is useful for a couple of
> > > > usecases, mainly:
> > > > 1. Undocumented vendor specific hardware commands
> > > > 2. Prototyping new hardware commands not yet supported by the driver
> > >
> > > This sounds like a recipe for ..
> > >
> > > In case you really really want this may I recommend you do two things:
> > >
> > > - Wrap this whole thing with #ifdef
> > > CONFIG_CXL_DEBUG_THIS_WILL_DESTROY_YOUR_LIFE
> > >
> > > (or something equivalant to make it clear this should never be
> > > enabled in production kernels).
> > >
> > > - Add a nice big fat printk in dmesg telling the user that they
> > > are creating a unstable parallel universe that will lead to their
> > > blood pressure going sky-high, or perhaps something more professional
> > > sounding.
> > >
> > > - Rethink this. Do you really really want to encourage vendors
> > > to use this raw API instead of them using the proper APIs?
> >
> > Again, the ideal is proper APIs. Barring that they get a WARN, and a taint if
> > they use the raw commands.
>
> Linux upstream is all about proper APIs. Just don't do this.
> >
> > >
> > > >
> > > > While this all sounds very powerful it comes with a couple of caveats:
> > > > 1. Bug reports using raw commands will not get the same level of
> > > > attention as bug reports using supported commands (via taint).
> > > > 2. Supported commands will be rejected by the RAW command.
> > > >
> > > > With this comes new debugfs knob to allow full access to your toes with
> > > > your weapon of choice.
> > >
> > > Problem is that debugfs is no longer "debug" but is enabled in
> > > production kernel.
> >
> > I don't see this as my problem. Again, they've been WARNed and tainted. If they
>
> Right not your problem, nice.
>
> But it is going to be the problem of vendor kernel engineers who don't have this luxury.
>
> > want to do this, that's their business. They will be asked to reproduce without
> > RAW if they file a bug report.
>
>
> This is not how customers see the world. "If it is there, then it is
> there to used right? Why else would someone give me the keys to this?"
>
> Just kill this. Or better yet, make it a seperate set of patches for
> folks developing code but not have it as part of this patchset.

In the ACPI NFIT driver, the only protection against vendor
shenanigans is the requirement that any and all DSM functions be
described in a public specification, so there is no unfettered access
to the DSM interface However, multiple vendors just went ahead and
included a "vendor passthrough" as a DSM sub-command in their
implementation. The driver does have the "disable_vendor_specific"
module parameter, however that does not amount to much more than a
stern look from the kernel at vendors shipping functionality through
that path rather than proper functions. It has been a source of bugs.

The RAW command proposal Ben has here is a significant improvement on
that status quo. It's built on the observation that customers pick up
the phone whenever their kernel backtraces, and makes it is easy to
spot broken tooling. That said, I think it is reasonable to place the
RAW interface behind a configuration option and let distribution
policy decide the availability.