Re: [RFC PATCH 00/11] Rust null block driver

From: Jens Axboe
Date: Thu May 04 2023 - 16:56:16 EST


On 5/4/23 1:59?PM, Andreas Hindborg wrote:
>
> Jens Axboe <axboe@xxxxxxxxx> writes:
>
>> On 5/4/23 12:52?PM, Keith Busch wrote:
>>> On Thu, May 04, 2023 at 11:36:01AM -0700, Bart Van Assche wrote:
>>>> On 5/4/23 11:15, Andreas Hindborg wrote:
>>>>> If it is still unclear to you why this effort was started, please do let
>>>>> me know and I shall try to clarify further :)
>>>>
>>>> It seems like I was too polite in my previous email. What I meant is that
>>>> rewriting code is useful if it provides a clear advantage to the users of
>>>> a driver. For null_blk, the users are kernel developers. The code that has
>>>> been posted is the start of a rewrite of the null_blk driver. The benefits
>>>> of this rewrite (making low-level memory errors less likely) do not outweigh
>>>> the risks that this effort will introduce functional or performance regressions.
>>>
>>> Instead of replacing, would co-existing be okay? Of course as long as
>>> there's no requirement to maintain feature parity between the two.
>>> Actually, just call it "rust_blk" and declare it has no relationship to
>>> null_blk, despite their functional similarities: it's a developer
>>> reference implementation for a rust block driver.
>>
>> To me, the big discussion point isn't really whether we're doing
>> null_blk or not, it's more if we want to go down this path of
>> maintaining rust bindings for the block code in general. If the answer
>> to that is yes, then doing null_blk seems like a great choice as it's
>> not a critical piece of infrastructure. It might even be a good idea to
>> be able to run both, for performance purposes, as the bindings or core
>> changes.
>>
>> But back to the real question... This is obviously extra burden on
>> maintainers, and that needs to be sorted out first. Block drivers in
>> general are not super security sensitive, as it's mostly privileged code
>> and there's not a whole lot of user visibile API. And the stuff we do
>> have is reasonably basic. So what's the long term win of having rust
>> bindings? This is a legitimate question. I can see a lot of other more
>> user exposed subsystems being of higher interest here.
>
> Even though the block layer is not usually exposed in the same way
> that something like the USB stack is, absence of memory safety bugs is
> a very useful property. If this is attainable without sacrificing
> performance, it seems like a nice option to offer future block device
> driver developers. Some would argue that it is worth offering even in
> the face of performance regression.
>
> While memory safety is the primary feature that Rust brings to the
> table, it does come with other nice features as well. The type system,
> language support stackless coroutines and error handling language
> support are all very useful.

We're in violent agreement on this part, I don't think anyone sane would
argue that memory safety with the same performance [1] isn't something
you'd want. And the error handling with rust is so much better than the
C stuff drivers do now that I can't see anyone disagreeing on that being
a great thing as well.

The discussion point here is the price being paid in terms of people
time.

> Regarding maintenance of the bindings, it _is_ an amount extra work. But
> there is more than one way to structure that work. If Rust is accepted
> into the block layer at some point, maintenance could be structured in
> such a way that it does not get in the way of existing C maintenance
> work. A "rust keeps up or it breaks" model. That could work for a while.

That potentially works for null_blk, but it would not work for anything
that people actually depend on. In other words, anything that isn't
null_blk. And I don't believe we'd be actively discussing these bindings
if just doing null_blk is the end goal, because that isn't useful by
itself, and at that point we'd all just be wasting our time. In the real
world, once we have just one actual driver using it, then we'd be
looking at "this driver regressed because of change X/Y/Z and that needs
to get sorted before the next release". And THAT is the real issue for
me. So a "rust keeps up or it breaks" model is a bit naive in my
opinion, it's just not a viable approach. In fact, even for null_blk,
this doesn't really fly as we rely on blktests to continually vet the
sanity of the IO stack, and null_blk is an integral part of that.

So I really don't think there's much to debate between "rust people vs
jens" here, as we agree on the benefits, but my end of the table has to
stomach the cons. And like I mentioned in an earlier email, that's not
just on me, there are other regular contributors and reviewers that are
relevant to this discussion. This is something we need to discuss.

[1] We obviously need to do real numbers here, the ones posted I don't
consider stable enough to be useful in saying "yeah it's fully on part".
If you have an updated rust nvme driver that uses these bindings I'd
be happy to run some testing that will definitively tell us if there's a
performance win, loss, or parity, and how much.

--
Jens Axboe