Re: [PATCH v2 2/4] devcoredump: Add dev_coredumpm_timeout()

From: Souza, Jose
Date: Mon Mar 04 2024 - 09:29:28 EST


On Fri, 2024-03-01 at 09:38 +0100, Johannes Berg wrote:
> > On Wed, 2024-02-28 at 17:56 +0000, Souza, Jose wrote:
> > > >
> > > > In my opinion, the timeout should depend on the type of device driver.
> > > >
> > > > In the case of server-class Ethernet cards, where corporate users automate most tasks, five minutes might even be considered excessive.
> > > >
> > > > For our case, GPUs, users might experience minor glitches and only search for what happened after finishing their current task (writing an email,
> > > > ending a gaming match, watching a YouTube video, etc.).
> > > > If they land on https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html or the future Xe version of that page, following the
> > > > instructions alone may take inexperienced Linux users more than five minutes.
> >
> > That's all not wrong, but I don't see why you wouldn't automate this
> > even on end user machines? I feel you're boxing the problem in by
> > wanting to solve it entirely in the kernel?

The other part of the stack that we provide are the libraries implementing Vulkan and OpenGL APIs, I don't think we could ship scripts that needs
elevated privileges to read and store coredump.

> >
> > > > I have set the timeout to one hour in the Xe driver, but this could increase if we start receiving user complaints.
> >
> > At an hour now, people will probably start arguing that "indefinitely"
> > is about right? But at that point you're probably back to persisting
> > them on disk anyway? Or maybe glitches happen during logout/shutdown ...

i915 driver don't use coredump and it persist the error dump in memory until user frees it or reboot it and we got no complains.

> >
> > Anyway, I don't want to block this because I just don't care enough
> > about how you do things, but I think the kernel is the wrong place to
> > solve this problem... The intent here was to give some userspace time to
> > grab it (and yes for that 5 minutes is already way too long), not the
> > users. That's also part of the reason we only hold on to a single
> > instance, since I didn't want it to keep consuming more and more memory
> > for it if happens repeatedly.
> >

okay so will move forward with other version applying your suggestion to make dev_coredumpm() static inline and move to the header.

thank you for the feedback

> > johannes