Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver

From: Jason Gunthorpe
Date: Mon Sep 21 2020 - 07:52:44 EST


On Sun, Sep 20, 2020 at 10:47:02AM +0200, Greg Kroah-Hartman wrote:
> > If not, what open source userspace are you going to ask them to
> > present to merge the kernel side into misc?
>
> I don't think that they have a userspace api to their rdma feature from
> what I understand, but I could be totally wrong as I do not know their
> hardware at all, so I'll let them answer this question.

I thought Oded was pretty clear, the goal of this series is to expose
their RDMA HW to userspace. This problem space requires co-mingling
networking and compute at extremely high speed/low overhead. This is
all done in userspace.

We are specifically talking about this in
include/uapi/misc/habanalabs.h:

/*
* NIC
*
* This IOCTL allows the user to manage and configure the device's NIC ports.
* The following operations are available:
* - Create a completion queue
* - Destroy a completion queue
* - Wait on completion queue
* - Poll a completion queue
* - Update consumed completion queue entries
* - Set a work queue
* - Unset a work queue
*
* For all operations, the user should provide a pointer to an input structure
* with the context parameters. Some of the operations also require a pointer to
* driver regarding how many of the available CQEs were actually
* processed/consumed. Only then the driver will override them with newer
* entries.
* The set WQ operation should provide the device virtual address of the WQ with
* a matching size for the number of WQs and entries per WQ.
*
*/
#define HL_IOCTL_NIC _IOWR('H', 0x07, struct hl_nic_args)

Which is ibv_create_qp, ibv_create_cq, ibv_poll_cq, etc, etc

Habana has repeatedly described their HW as having multiple 100G RoCE
ports. RoCE is one of the common industry standards that ibverbs
unambiguously is responsible for.

I would be much less annoyed if they were not actively marketing their
product as RoCE RDMA.

Sure there is some argument that their RoCE isn't spec compliant, but
I don't think it excuses the basic principle of our subsystem:

RDMA HW needs to demonstrate some basic functionality using the
standard open source userspace software stack.

I don't like this idea of backdooring a bunch of proprietary closed
source RDMA userspace through drivers/misc, and if you don't have a
clear idea how to get something equal for drivers/misc you should not
accept the H_IOCTL_NIC.

Plus RoCE is complicated, there is a bunch of interaction with netdev
and rules related to that that really needs to be respected.

> For anything that _has_ to have a userspace RMDA interface, sure ibverbs
> are the one we are stuck with, but I didn't think that was the issue
> here at all, which is why I wrote the above comments.

I think you should look at the patches #8 through 11:

https://lore.kernel.org/lkml/20200915171022.10561-9-oded.gabbay@xxxxxxxxx/

Jason