Re: [PATCH 2/3] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name

From: Christian König
Date: Mon Jul 31 2023 - 04:13:23 EST


Am 21.07.23 um 12:33 schrieb Asahi Lina:
[SNIP]

I've already tried to explain the issue. The DRM scheduler design, as it stands today, makes it impractical to write a safe Rust abstraction for it. This is a fact. Christian has repeatedly NAKed my attempts at changing it to make such a safe abstraction possible, and is clearly opposed to the fundamental lifetime requirements change I am trying to make here. Therefore, we are at an impasse.

It's unfortunate, but given this situation, the DRM scheduler will not be available to Rust DRM drivers. I thought the goal of the DRM subsystem common code was to cater to multiple drivers and usage approaches, with an emphasis on doing things "right" and avoiding design issues that are common mistakes in driver design. Clearly, this is not the case for all of DRM, at least not the DRM scheduler.

In software engineering, complex lifetime requirements are an anti-pattern, which is one reason why Rust draws a line between safe and unsafe APIs. For a safe API, it is up to the API developer to design it such that it cannot be misused in a way that causes memory safety issues, and the only lifetime requirements it can impose are those that can be expressed in the Rust type system and statically checked at compile time. The DRM scheduler's complex chain of lifetime requirements cannot, and wrapping it in enough glue to remove those lifetime requirements would require about as much code as just rewriting it, as well as add unacceptable duplication and overhead.

In kernel Rust, we strive to only have safe APIs for components which have no fundamental reason for being unsafe (things like direct memory mapping and raw hardware access). The DRM scheduler does not fall into one of those "inherently unsafe" categories, so it doesn't make sense to provide a raw unsafe API for it.

This is not completely correct. The DRM scheduler provides a dma_fence interface as wrapper around hardware dma_fence interfaces.

This interface is made to allow core Linux memory management to query the progress of hardware operations.

So you are working with something very low level here and have to follow restrictions which Rust can't enforce from the language because it simply can't know about that at compile time.

Doing so would just expose Rust code to the kind of subtle safety implications that cause memory problems every day in C. If I were to use drm_sched's unsafe API "as is" in my driver, it would *by far* be the least auditable, most complex usage of unsafe code in the entire driver, and I have no confidence that I would be able to get it right and keep it right as the driver changes.

I don't see a reason why this cannot be simply fixed in drm_sched, but Christian disagrees, and has repeatedly (and strongly) NAKed my attempts at doing so, and indeed NAKed the entire premise of the change in lifetime requirements itself. So here we are. There is no solution that will work for everyone that involves drm_sched.

So I don't have a choice other than to just not use drm_sched and roll my own. I will happily move that Rust implementation to common code if another Rust driver comes along and wants to use it. I'm not sure if that kind of thing can be easily made available to C drivers in reverse, but I guess we'll cross that bridge when a C driver expresses interest in using it.

Well, to make it clear once more: Signaling a dma_fence from the destructor of a reference counted object is very problematic! This will be rejected no matter if you do that in C or in Rust.

What we can do is to make it safe in the sense that you don't access freed up memory by using the scheduler fences even more as wrapper around the hardware fence as we do now. But this quite a change and requires a bit more than just hacking around drm_sched_fence_get_timeline_name().


So far it seems existing C drivers are happy with drm_sched's design and complex lifetime requirements, even though they aren't even documented. Perhaps they managed to reverse engineer them from the source, or someone told the authors about it (certainly nobody told me when I started using drm_sched). Or maybe a bunch of the drm_scheduler users are actually broken and have memory safety issues in corner cases. I don't know, though if I had to bet, I'd bet on the second option.

Actually, let's do a quick search and find out!

panfrost_remove() -> panfrost_device_fini() -> panfrost_job_fini() -> drm_sched_fini()

There is a direct, synchronous path between device removal and destroying the DRM scheduler. At no point does it wait for any jobs to complete. That's what patch #3 in this series tries to fix.

In fact, it doesn't even keep the entities alive! It calls drm_dev_unregister() before everything else, but as the docs for the DRM driver lifetimes clearly say (see, docs!), objects visible to userspace must survive that and only be released from the release callback. drm_sched entities are created/destroyed from panfrost_job_open()/panfrost_job_close(), which are called from panfrost_open() and panfrost_postclose(), which are the userspace file open/close functions.

That one I fix in the Rust abstraction already (that one's relatively easy to fix), so it doesn't need a drm_sched patch from my point of view, but it is yet another subtle, undocumented lifetime requirement that is, evidently, impossible to know about and get right without documentation.

Meanwhile, panfrost_fence_ops has no remove() callback, which means there is no reference path stopping device removal (and therefore scheduler teardown) or even module unload while driver fences are still alive. That leads to the UAF patch #2 in this series tries to fix.

In other words, as far as I can tell, the panfrost driver gets *everything* wrong when it comes to the DRM scheduler lifetime requirements, and will crash and burn if the driver is unbound while jobs are in flight, anyone has a file descriptor open at all, or even if any shared buffer holding a DRM scheduler fence from it is bound to the display controller at that time.

Yeah, that is perfectly correct what you wrote.

Daniel and I have gone back and forth multiple times how we should design this and we opted to not use direct pointers for the contexts because that allows for simpler driver implementations.

The DRM scheduler doesn't document the lifetime requirements because it doesn't define the lifetime requirements. From the design the DRM scheduler is supposed to be an component wrapping around DMA fences. And those DMA fences have the necessary lifetime definition.

Now DMA fences have their live cycles explained in the structure documentation, but it doesn't really talk much about the requirements for dma_fence_ops implementations. We should probably improve that.

So yes, drivers need to keep the structures which might be accessed by userspace alive even after the underlying device is removed. But signaling dma_fences is completely independent from that.


This is why this kind of design is not allowed in Rust.

Well it is allowed, it's just not safe.

Regards,
Christian.

Because nobody gets it right. *Especially* not without docs. I assumed, like the authors of the Panfrost driver clearly assumed, that the DRM scheduler API would not have these crazy undocumented hidden requirements. The only reason I found out the hard way is I happen to create and destroy schedulers all the time, not just once globally, so I would hit the bugs and dangling pointers much more often than Panfrost users who, most likely, never unbind their devices. But we both have the same problem.

I think I've done all I can to explain the issues and try to fix them, so the ball is in your court now. If you want to keep the current design, that's fine, but Rust code will not be a drm_sched user in that case. And the rest of the DRM folks in the C world will have to contend with these issues and fix all the problems in the C drivers (I'm sure panfrost isn't the only one, it's just literally the first one I looked at).

As for me, I'm happy to write a simple workqueue-based Rust scheduler suitable for firmware-managed scheduler devices. Honestly, at this point, I have very little faith left in my ability to fix all these issues in drm_sched (I know there's at least one more lurking, I saw a NULL deref but I wasn't able to reproduce it nor divine how it possibly happened). That, combined with the hostility from the AMD folks about my attempts to improve drm_sched even just a little bit, makes that decision very easy.

Farewell, DRM scheduler. It was nice trying to work with you, but things just didn't work out. I won't be submitting a v2 to this series myself. Please ping me if you fix all these fundamental design issues with drm_sched and think I might actually be able to use it safely in Rust one day. If the API design is solid and safe and the implementation done in a way that inspires confidence at that time maybe we can yank out my Rust solution when the time comes and switch back to drm_sched.

Just please don't expect me to do the work any more, I've done everything I can and this now has to come from you, not me. I've spent way more time understanding drm_sched, auditing its code, understanding its design intent, trying to fix it, and getting yelled at for it than it would take to write a new, clean, safe Rust scheduler. I don't regret some of the time spent (some of the implementation details of drm_sched have taught me useful things), but I'm not prepared to spend any more, sorry.

~~ Lina