The challenge to make ECMDQ useful to Linux is how to make sure that all
the commands expected to be within scope of a future CMND_SYNC plus that
sync itself all get issued on the same queue, so I'd be mildly surprised
if you didn't have the same problem.
PATCH-3 in this series actually helps align the command queues,
between issued commands and SYNC, if bool sync == true. Yet, if
doing something like issue->issue->issue_with_sync, it could be
tricker.
Indeed between the iommu_iotlb_gather mechanism and low-level command
batching things are already a lot more concentrated than they could be,
but arm_smmu_cmdq_batch_add() and its callers stand out as examples of
where we'd still be vulnerable to preemption. What I haven't even tried
to reason about yet is assumptions in the higher-level APIs, e.g. if
io-pgtable might chuck out a TLBI during an iommu_unmap() which we
implicitly expect a later iommu_iotlb_sync() to cover.
Though I might have oversimplified the situation here, I see
the arm_smmu_cmdq_batch_add() calls are typically followed by
arm_smmu_cmdq_batch_submit(). Could we just add a SYNC in the
_batch_submit() to all the queues that it previously touched
in the _batch_add()?
I've been thinking that in many ways per-domain queues make quite a bit
of sense and would be easier to manage than per-CPU ones - plus that's
pretty much the usage model once we get to VMs anyway - but that fails
to help the significant cases like networking and storage where many
CPUs are servicing a big monolithic device in a single domain :(
Yea, and it's hard to assume which client would use CMDQ more
frequently, in order to balance or assign more queues to that
client, which feels like a QoS conundrum.