RE: [PATCH 1/9] iommu: Move iommu fault data to linux/iommu.h

From: Tian, Kevin
Date: Wed Jul 12 2023 - 23:22:30 EST


> From: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
> Sent: Wednesday, July 12, 2023 5:34 PM
>
> On Wed, Jul 12, 2023 at 10:07:22AM +0800, Baolu Lu wrote:
> > > > +/**
> > > > + * struct iommu_fault_unrecoverable - Unrecoverable fault data
> > > > + * @reason: reason of the fault, from &enum iommu_fault_reason
> > > > + * @flags: parameters of this fault (IOMMU_FAULT_UNRECOV_*
> values)
> > > > + * @pasid: Process Address Space ID
> > > > + * @perm: requested permission access using by the incoming
> transaction
> > > > + * (IOMMU_FAULT_PERM_* values)
> > > > + * @addr: offending page address
> > > > + * @fetch_addr: address that caused a fetch abort, if any
> > > > + */
> > > > +struct iommu_fault_unrecoverable {
> > > > + __u32 reason;
> > > > +#define IOMMU_FAULT_UNRECOV_PASID_VALID (1 <<
> 0)
> > > > +#define IOMMU_FAULT_UNRECOV_ADDR_VALID (1 <<
> 1)
> > > > +#define IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID (1 <<
> 2)
> > > > + __u32 flags;
> > > > + __u32 pasid;
> > > > + __u32 perm;
> > > > + __u64 addr;
> > > > + __u64 fetch_addr;
> > > > +};
> > >
> > > Currently there is no handler for unrecoverable faults.
>
> Yes those were meant for guest injection. Another goal was to replace
> report_iommu_fault(), which also passes unrecoverable faults to host
> drivers. Three drivers use that API:
> * usnic just prints the error, which could be done by the IOMMU driver,
> * remoteproc attempts to recover from the crash,
> * msm attempts to handle the fault, or at least recover from the crash.

I was not aware of them. Thanks for pointing out.

>
> So the first one can be removed, and the others could move over to IOPF
> (which may need to indicate that the fault is not actually recoverable by
> the IOMMU) and return IOMMU_PAGE_RESP_INVALID.

Yep, presumably we should have just one interface to handle fault.

>
> > >
> > > Both Intel/ARM register iommu_queue_iopf() as the device fault handler.
> > > It returns -EOPNOTSUPP for unrecoverable faults.
> > >
> > > In your series the common iommu_handle_io_pgfault() also only works
> > > for PRQ.
> > >
> > > It kinds of suggest above definitions are dead code, though arm-smmu-v3
> > > does attempt to set them.
> > >
> > > Probably it's right time to remove them.
> > >
> > > In the future even if there might be a need of forwarding unrecoverable
> > > faults to the user via iommufd, fault reasons reported by the physical
> > > IOMMU doesn't make any sense to the guest.
>
> I guess it depends on the architecture? The SMMU driver can report only
> stage-1 faults through iommu_report_device_fault(), which are faults due
> to a guest misconfiguring the tables assigned to it. At the moment
> arm_smmu_handle_evt() only passes down stage-1 page table errors, the
> rest
> is printed by the host.

In that case the kernel just needs to notify the vIOMMU an error happened
along with access permissions (r/w/e/p). vIOMMU can figure out the reason
itself by walking the stage-1 page table. Likely it will find the same reason
as host reports, but that sounds a clearer path in concept.

>
> > > Presumably the vIOMMU
> > > should walk guest configurations to set a fault reason which makes sense
> > > from guest p.o.v.
> >
> > I am fine to remove unrecoverable faults data. But it was added by Jean,
> > so I'd like to know his opinion on this.
>
> Passing errors to the guest could be a useful diagnostics tool for
> debugging, once the guest gets more controls over the IOMMU hardware,
> but
> it doesn't have a purpose beyond that. It could be the only tool
> available, though: to avoid a guest voluntarily flooding the host logs by
> misconfiguring its tables, we may have to disable printing in the host
> errors that come from guest misconfiguration, in which case there won't be
> any diagnostics available for guest bugs.
>
> For now I don't mind if they're removed, if there is an easy way to
> reintroduce them later.
>

We can keep whatever is required to satisfy the kernel drivers which
want to know the fault.

But for anything invented for old uAPI (e.g. fault_reason) let's remove
them and redefine later when introducing the support to the user.