Re: [PATCH 08/14] taint: add taint for direct hardware access

From: Konrad Rzeszutek Wilk
Date: Mon Feb 01 2021 - 21:51:40 EST


On Mon, Feb 01, 2021 at 11:01:11AM -0800, Dan Williams wrote:
> On Mon, Feb 1, 2021 at 10:35 AM Ben Widawsky <ben.widawsky@xxxxxxxxx> wrote:
> >
> > On 21-02-01 13:18:45, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Jan 29, 2021 at 04:24:32PM -0800, Ben Widawsky wrote:
> > > > For drivers that moderate access to the underlying hardware it is
> > > > sometimes desirable to allow userspace to bypass restrictions. Once
> > > > userspace has done this, the driver can no longer guarantee the sanctity
> > > > of either the OS or the hardware. When in this state, it is helpful for
> > > > kernel developers to be made aware (via this taint flag) of this fact
> > > > for subsequent bug reports.
> > > >
> > > > Example usage:
> > > > - Hardware xyzzy accepts 2 commands, waldo and fred.
> > > > - The xyzzy driver provides an interface for using waldo, but not fred.
> > > > - quux is convinced they really need the fred command.
> > > > - xyzzy driver allows quux to frob hardware to initiate fred.
> > >
> > > Would it not be easier to _not_ frob the hardware for fred-operation?
> > > Aka not implement it or just disallow in the first place?
> >
> > Yeah. So the idea is you either are in a transient phase of the command and some
> > future kernel will have real support for fred - or a vendor is being short
> > sighted and not adding support for fred.
> >
> > >
> > >
> > > > - kernel gets tainted.
> > > > - turns out fred command is borked, and scribbles over memory.
> > > > - developers laugh while closing quux's subsequent bug report.
> > >
> > > Yeah good luck with that theory in-the-field. The customer won't
> > > care about this and will demand a solution for doing fred-operation.
> > >
> > > Just easier to not do fred-operation in the first place,no?
> >
> > The short answer is, in an ideal world you are correct. See nvdimm as an example
> > of the real world.
> >
> > The longer answer. Unless we want to wait until we have all the hardware we're
> > ever going to see, it's impossible to have a fully baked, and validated
> > interface. The RAW interface is my admission that I make no guarantees about
> > being able to provide the perfect interface and giving the power back to the
> > hardware vendors and their driver writers.
> >
> > As an example, suppose a vendor shipped a device with their special vendor
> > opcode. They can enable their customers to use that opcode on any driver
> > version. That seems pretty powerful and worthwhile to me.
> >
>
> Powerful, frightening, and questionably worthwhile when there are
> already examples of commands that need extra coordination for whatever
> reason. However, I still think the decision tilts towards allowing
> this given ongoing spec work.
>
> NVDIMM ended up allowing unfettered vendor passthrough given the lack
> of an organizing body to unify vendors. CXL on the other hand appears
> to have more gravity to keep vendors honest. A WARN splat with a
> taint, and a debugfs knob for the truly problematic commands seems
> sufficient protection of system integrity while still following the
> Linux ethos of giving system owners enough rope to make their own
> decisions.
>
> > Or a more realistic example, we ship a driver that adds a command which is
> > totally broken. Customers can utilize the RAW interface until it gets fixed in a
> > subsequent release which might be quite a ways out.
> >
> > I'll say the RAW interface isn't an encouraged usage, but it's one that I expect
> > to be needed, and if it's not we can always try to kill it later. If nobody is
> > actually using it, nobody will complain, right :D
>
> It might be worthwhile to make RAW support a compile time decision so
> that Linux distros can only ship support for the commands the CXL
> driver-dev community has blessed, but I'll leave it to a distro
> developer to second that approach.

Couple of thoughts here:

- As distro developer (well, actually middle manager of distro
developers) this approach of raw interface is a headache.

Customers will pick it and use it since it is there and the poor
support folks will have to go through layers of different devices to
say (for example) to finally find out that some OEM firmware opcode
X is a debug facility for inserting corrupted data, while for another vendor
the same X opcode makes it go super-fast.

Not that anybody would do that, right? Ha!

- I will imagine that some of the more vocal folks in the community
will make it difficult to integrate these patches with these two
(especially this taint one). This will make the acceptance of these
patches more difficult than it should be. If you really want them,
perhaps make them part of another patchset, or a follow up ones.

- I still don't get why as a brand new community hacks are coming up
(even when the hardware is not yet there) instead of pushing back at
the vendors to have a clean up interface. I get in say two or three
years these things .. but from the start? I get your point about
flexibility, but it seems to me that the right way is not give open
RAW interface (big barndoor) but rather maintain the driver and grow
it (properly constructed doors) as more functionality comes about
and then adding it in the driver.