Re: [PATCH v4 05/45] drm/connector: Check drm_connector_init pointers arguments

From: Maxime Ripard
Date: Wed Nov 29 2023 - 08:26:20 EST


On Wed, Nov 29, 2023 at 01:40:38PM +0200, Jani Nikula wrote:
> On Wed, 29 Nov 2023, Maxime Ripard <mripard@xxxxxxxxxx> wrote:
> > On Wed, Nov 29, 2023 at 11:38:42AM +0200, Jani Nikula wrote:
> >> On Wed, 29 Nov 2023, Maxime Ripard <mripard@xxxxxxxxxx> wrote:
> >> > Hi Ville,
> >> >
> >> > On Tue, Nov 28, 2023 at 03:49:08PM +0200, Ville Syrjälä wrote:
> >> >> On Tue, Nov 28, 2023 at 02:29:40PM +0100, Maxime Ripard wrote:
> >> >> > On Tue, Nov 28, 2023 at 02:54:02PM +0200, Jani Nikula wrote:
> >> >> > > On Tue, 28 Nov 2023, Maxime Ripard <mripard@xxxxxxxxxx> wrote:
> >> >> > > > All the drm_connector_init variants take at least a pointer to the
> >> >> > > > device, connector and hooks implementation.
> >> >> > > >
> >> >> > > > However, none of them check their value before dereferencing those
> >> >> > > > pointers which can lead to a NULL-pointer dereference if the author
> >> >> > > > isn't careful.
> >> >> > >
> >> >> > > Arguably oopsing on the spot is preferrable when this can't be caused by
> >> >> > > user input. It's always a mistake that should be caught early during
> >> >> > > development.
> >> >> > >
> >> >> > > Not everyone checks the return value of drm_connector_init and friends,
> >> >> > > so those cases will lead to more mysterious bugs later. And probably
> >> >> > > oopses as well.
> >> >> >
> >> >> > So maybe we can do both then, with something like
> >> >> >
> >> >> > if (WARN_ON(!dev))
> >> >> > return -EINVAL
> >> >> >
> >> >> > if (drm_WARN_ON(dev, !connector || !funcs))
> >> >> > return -EINVAL;
> >> >> >
> >> >> > I'd still like to check for this, so we can have proper testing, and we
> >> >> > already check for those pointers in some places (like funcs in
> >> >> > drm_connector_init), so if we don't cover everything we're inconsistent.
> >> >>
> >> >> People will invariably cargo-cult this kind of stuff absolutely
> >> >> everywhere and then all your functions will have tons of dead
> >> >> code to check their arguments.
> >> >
> >> > And that's a bad thing because... ?
> >> >
> >> > Also, are you really saying that checking that your arguments make sense
> >> > is cargo-cult?
> >>
> >> It's a powerful thing to be able to assume a NULL argument is always a
> >> fatal programming error on the caller's side, and should oops and get
> >> caught immediately. It's an assertion.
> >
> > Yeah, but we're not really doing that either. We have no explicit
> > assertion anywhere. We take a pointer in, and just hope that it will be
> > dereferenced later on and that the kernel will crash. The pointer to the
> > functions especially is only deferenced very later on.
> >
> > And assertions might be powerful, but being able to notice errors and
> > debug them is too. A panic takes away basically any remote access to
> > debug. If you don't have a console, you're done.
> >
> >> We're not talking about user input or anything like that here.
> >>
> >> If you start checking for things that can't happen, and return errors
> >> for them, you start gracefully handling things that don't have anything
> >> graceful about them.
> >
> > But there's nothing graceful to do here: you just return from your probe
> > function that you couldn't probe and that's it. Just like you do when
> > you can't map your registers, or get your interrupt, or register into
> > any framework (including drm_dev_register that pretty much every driver
> > handles properly if it returns an error, without being graceful about
> > it).
>
> Those are all dynamic things that can fail.
>
> Quite different from passing NULL dev, connector, or funcs to
> drm_connector_init() and friends.
>
> I think it's wrong to set the example that everything needs to be
> checked, everything needs to return an error, every call needs to check
> for error return, all the time, everywhere. People absolutely will cargo
> cult that, and that's what Ville is referring to.
>
> If you pass NULL dev, connector, or funcs to drm_connector_init() I
> think you absolutely deserve to get an oops.
>
> For dev, you could possibly not have reached the function with NULL
> dev. (And __drm_connector_init() has dev->mode_config before the check,
> so you'll get a static analyzer warning about dereference before the
> check.) If you have NULL connector, you didn't check for allocation
> failure earlier. If you have NULL funcs, you just passed NULL, because
> it's generally supposed to be a pointer to a static const struct.
>
> >> Having such checks in place trains people to think they *may* happen.
> >
> > In most cases, kmalloc can't fail. We seem to have a very different
> > policy towards it.
>
> Again, dynamic in nature and can fail.
>
> >> While it should fail fast and loud at the developer's first smoke test,
> >> and get fixed then and there.
> >
> > Returning an error + a warning also qualifies for "fail fast and loud".
> > But keeps the system alive for someone to notice in any case.
>
> But where do you draw the line?

This also applies to static things then.
drm_connector_attach_scaling_mode_property() or
drm_mode_create_colorspace_property() (or plenty of others) will check
on the value of the supported scaling modes colorspaces, even though
they are static.

It looks like we have that policy of "just assert and roll with it" for
pointers, but not for other static values passed to those initialization
functions.

> If we keep adding these checks to things that actually can't happen,
> we teach developers we need to check for impossible things. And we
> teach them not to trust anything.

Well, I certainly don't trust drivers to get things right.

> I scroll down the file and reach drm_connector_attach_edid_property().
> Should we NULL check connector? Should we change the function to int
> and return a value? Should the caller check the value? Then there's
> drm_connector_attach_encoder(). And
> drm_connector_has_possible_encoder(). And so on and so forth.
>
> Where do you draw the line?

If things can fail, we should expect the caller to handle the failure
somehow. The documentation of drm_connector_attach_encoder() states that
it can fail, so we should expect it.
drm_connector_has_possible_encoder() doesn't so we can assume it can't
fail.

If the function can fail but wasn't designed or documented as such, then
it's on the function. If it was but the caller didn't handle the error
case, then that's on the caller.

Maxime

Attachment: signature.asc
Description: PGP signature