Re: [RFC PATCH 1/1] drivers: base: Expose probe failures via debugfs

From: Greg Kroah-Hartman
Date: Fri Jun 04 2021 - 08:58:58 EST


On Thu, Jun 03, 2021 at 11:00:14PM +0300, Adrian Ratiu wrote:
> On Thu, 03 Jun 2021, Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> > On Thu, Jun 03, 2021 at 03:55:34PM +0300, Adrian Ratiu wrote:
> > > This adds a new devices_failed debugfs attribute to list driver
> > > probe failures excepting -EPROBE_DEFER which are still handled as
> > > before via their own devices_deferred attribute.
> >
> > Who is going to use this?
> >
>
> It's for KernelCI testing, I explained the background in my other reply.
>
> > > This is useful on automated test systems like KernelCI to avoid
> > > filtering dmesg dev_err() messages to extract potential probe
> > > failures.
> >
> > I thought we listed these already some other way today?
> >
>
> The only other place is the printk buffer via dev_err() and only the result
> subset of -EPROBE_DEFER info is exported via debugfs.
>
> An additional problem with this new interface implementation is that it is
> based on the new-ish driver core "dev_err_probe" helper to which not all
> drivers have been converted (yet...), so there will be a mismatch between
> printk and this new interface.

Then why not move to use the new interface :)

> > > Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> Cc: "Rafael J.
> > > Wysocki" <rafael@xxxxxxxxxx> Cc: Guillaume Tucker
> > > <gtucker.collabora@xxxxxxxxx> Suggested-by: Enric Balletbò
> > > <enric.balletbo@xxxxxxxxxxxxx> Signed-off-by: Adrian Ratiu
> > > <adrian.ratiu@xxxxxxxxxxxxx> --- drivers/base/core.c | 76
> > > +++++++++++++++++++++++++++++++++++++++++++-- lib/Kconfig.debug |
> > > 23 ++++++++++++++ 2 files changed, 96 insertions(+), 3 deletions(-)
> > > diff --git a/drivers/base/core.c b/drivers/base/core.c index
> > > b8a8c96dca58..74bf057234b8 100644 --- a/drivers/base/core.c +++
> > > b/drivers/base/core.c @@ -9,7 +9,9 @@ */ #include <linux/acpi.h>
> > > +#include <linux/circ_buf.h> #include <linux/cpufreq.h> +#include
> > > <linux/debugfs.h> #include <linux/device.h> #include <linux/err.h>
> > > #include <linux/fwnode.h> @@ -53,6 +55,15 @@ static
> > > DEFINE_MUTEX(fwnode_link_lock); static bool
> > > fw_devlink_is_permissive(void); static bool
> > > fw_devlink_drv_reg_done; +#ifdef CONFIG_DEBUG_FS_PROBE_ERR +#define
> > > PROBE_ERR_BUF_ELEM_SIZE 64 +#define PROBE_ERR_BUF_SIZE (1 <<
> > > CONFIG_DEBUG_FS_PROBE_ERR_BUF_SHIFT) +static struct circ_buf
> > > probe_err_crbuf; +static char
> > > failed_probe_buffer[PROBE_ERR_BUF_SIZE]; +static
> > > DEFINE_MUTEX(failed_probe_mutex); +static struct dentry
> > > *devices_failed_probe; +#endif
> >
> > All of this just for a log buffer? The kernel provides a great one,
> > printk, let's not create yet-another-log-buffer if at all possible
> > please.
>
> Yes, that is correct, this is esentially duplicating information already
> exposed via the printk buffer.

Not good, I will not take this for that reason alone. Also I don't want
to maintain something like this for the next 10+ years for no good
reason.

> > If the existing messages are "hard to parse", what can we do to make
> > them "easier" that would allow systems to do something with them?
> >
> > What _do_ systems want to do with this information anyway? What does it
> > help with exactly?
> >
>
> I know driver core probe error message formats are unlikely to change over
> time and debugfs in theory is as "stable" as printk, but I think the main
> concern is to find a a more reliable way than parsing printk to extract
> probe erros, like for the existing devices_deferred in debugfs.

But what exactly are you trying to detect? And what are you going to do
if you detect it?

> The idea in my specific case is to be able to reliably run driver tests in
> KernelCI for expected or unexpected probe errors like -EINVAL.

How about making a "real" test for this type of thing and we add that to
the kernel itself? Wouldn't that be a better thing to have?

thanks,

greg k-h