[RFC PATCH v1 1/2] docs: reporting-issue: rework the detailed guide

From: Thorsten Leemhuis
Date: Tue Mar 26 2024 - 08:22:48 EST


Rework the detailed step-by-step guide for various reasons:

* Simplify the search with the help of lore.kernel.org/all/, which did
not exist when the text was written.

* Make use of the recently added document
Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst,
which covers many steps this text partly covered way better.

* The 'quickly report a stable regression to the stable team' approach
hardly worked out: most of the time the regression was not known yet.
Try a different approach using the regressions list.

* Reports about stable/longterm regressions most of the time were
greeted with a brief reply along the lines of 'Is mainline affected as
well?'; this is needed to determine who is responsible, so we might as
well make the reporter check that before sending the report (which
verify-bugs-and-bisect-regressions.rst already tells them to do, too).

* A lot of fine tuning after seeing what people were struggling with.

FIXME: adjust the entries in the reference section to match these
changes.

Not-signed-off-by: Thorsten Leemhuis <linux@xxxxxxxxxxxxx>
---
.../admin-guide/reporting-issues.rst | 391 ++++++++++--------
1 file changed, 210 insertions(+), 181 deletions(-)

diff --git a/Documentation/admin-guide/reporting-issues.rst b/Documentation/admin-guide/reporting-issues.rst
index 2fd5a030235ad0..e6083946c146e8 100644
--- a/Documentation/admin-guide/reporting-issues.rst
+++ b/Documentation/admin-guide/reporting-issues.rst
@@ -48,187 +48,216 @@ Once the report is out, answer any questions that come up and help where you
can. That includes keeping the ball rolling by occasionally retesting with newer
releases and sending a status update afterwards.

-Step-by-step guide how to report issues to the kernel maintainers
-=================================================================
-
-The above TL;DR outlines roughly how to report issues to the Linux kernel
-developers. It might be all that's needed for people already familiar with
-reporting issues to Free/Libre & Open Source Software (FLOSS) projects. For
-everyone else there is this section. It is more detailed and uses a
-step-by-step approach. It still tries to be brief for readability and leaves
-out a lot of details; those are described below the step-by-step guide in a
-reference section, which explains each of the steps in more detail.
-
-Note: this section covers a few more aspects than the TL;DR and does things in
-a slightly different order. That's in your interest, to make sure you notice
-early if an issue that looks like a Linux kernel problem is actually caused by
-something else. These steps thus help to ensure the time you invest in this
-process won't feel wasted in the end:
-
- * Are you facing an issue with a Linux kernel a hardware or software vendor
- provided? Then in almost all cases you are better off to stop reading this
- document and reporting the issue to your vendor instead, unless you are
- willing to install the latest Linux version yourself. Be aware the latter
- will often be needed anyway to hunt down and fix issues.
-
- * Perform a rough search for existing reports with your favorite internet
- search engine; additionally, check the archives of the `Linux Kernel Mailing
- List (LKML) <https://lore.kernel.org/lkml/>`_. If you find matching reports,
- join the discussion instead of sending a new one.
-
- * See if the issue you are dealing with qualifies as regression, security
- issue, or a really severe problem: those are 'issues of high priority' that
- need special handling in some steps that are about to follow.
-
- * Make sure it's not the kernel's surroundings that are causing the issue
- you face.
-
- * Create a fresh backup and put system repair and restore tools at hand.
-
- * Ensure your system does not enhance its kernels by building additional
- kernel modules on-the-fly, which solutions like DKMS might be doing locally
- without your knowledge.
-
- * Check if your kernel was 'tainted' when the issue occurred, as the event
- that made the kernel set this flag might be causing the issue you face.
-
- * Write down coarsely how to reproduce the issue. If you deal with multiple
- issues at once, create separate notes for each of them and make sure they
- work independently on a freshly booted system. That's needed, as each issue
- needs to get reported to the kernel developers separately, unless they are
- strongly entangled.
-
- * If you are facing a regression within a stable or longterm version line
- (say something broke when updating from 5.10.4 to 5.10.5), scroll down to
- 'Dealing with regressions within a stable and longterm kernel line'.
-
- * Locate the driver or kernel subsystem that seems to be causing the issue.
- Find out how and where its developers expect reports. Note: most of the
- time this won't be bugzilla.kernel.org, as issues typically need to be sent
- by mail to a maintainer and a public mailing list.
-
- * Search the archives of the bug tracker or mailing list in question
- thoroughly for reports that might match your issue. If you find anything,
- join the discussion instead of sending a new report.
-
-After these preparations you'll now enter the main part:
-
- * Unless you are already running the latest 'mainline' Linux kernel, better
- go and install it for the reporting process. Testing and reporting with
- the latest 'stable' Linux can be an acceptable alternative in some
- situations; during the merge window that actually might be even the best
- approach, but in that development phase it can be an even better idea to
- suspend your efforts for a few days anyway. Whatever version you choose,
- ideally use a 'vanilla' build. Ignoring these advices will dramatically
- increase the risk your report will be rejected or ignored.
-
- * Ensure the kernel you just installed does not 'taint' itself when
- running.
-
- * Reproduce the issue with the kernel you just installed. If it doesn't show
- up there, scroll down to the instructions for issues only happening with
- stable and longterm kernels.
-
- * Optimize your notes: try to find and write the most straightforward way to
- reproduce your issue. Make sure the end result has all the important
- details, and at the same time is easy to read and understand for others
- that hear about it for the first time. And if you learned something in this
- process, consider searching again for existing reports about the issue.
-
- * If your failure involves a 'panic', 'Oops', 'warning', or 'BUG', consider
- decoding the kernel log to find the line of code that triggered the error.
-
- * If your problem is a regression, try to narrow down when the issue was
- introduced as much as possible.
-
- * Start to compile the report by writing a detailed description about the
- issue. Always mention a few things: the latest kernel version you installed
- for reproducing, the Linux Distribution used, and your notes on how to
- reproduce the issue. Ideally, make the kernel's build configuration
- (.config) and the output from ``dmesg`` available somewhere on the net and
- link to it. Include or upload all other information that might be relevant,
- like the output/screenshot of an Oops or the output from ``lspci``. Once
- you wrote this main part, insert a normal length paragraph on top of it
- outlining the issue and the impact quickly. On top of this add one sentence
- that briefly describes the problem and gets people to read on. Now give the
- thing a descriptive title or subject that yet again is shorter. Then you're
- ready to send or file the report like the MAINTAINERS file told you, unless
- you are dealing with one of those 'issues of high priority': they need
- special care which is explained in 'Special handling for high priority
- issues' below.
-
- * Wait for reactions and keep the thing rolling until you can accept the
- outcome in one way or the other. Thus react publicly and in a timely manner
- to any inquiries. Test proposed fixes. Do proactive testing: retest with at
- least every first release candidate (RC) of a new mainline version and
- report your results. Send friendly reminders if things stall. And try to
- help yourself, if you don't get any help or if it's unsatisfying.
-
-
-Reporting regressions within a stable and longterm kernel line
---------------------------------------------------------------
-
-This subsection is for you, if you followed above process and got sent here at
-the point about regression within a stable or longterm kernel version line. You
-face one of those if something breaks when updating from 5.10.4 to 5.10.5 (a
-switch from 5.9.15 to 5.10.5 does not qualify). The developers want to fix such
-regressions as quickly as possible, hence there is a streamlined process to
-report them:
-
- * Check if the kernel developers still maintain the Linux kernel version
- line you care about: go to the `front page of kernel.org
- <https://kernel.org/>`_ and make sure it mentions
- the latest release of the particular version line without an '[EOL]' tag.
-
- * Check the archives of the `Linux stable mailing list
- <https://lore.kernel.org/stable/>`_ for existing reports.
-
- * Install the latest release from the particular version line as a vanilla
- kernel. Ensure this kernel is not tainted and still shows the problem, as
- the issue might have already been fixed there. If you first noticed the
- problem with a vendor kernel, check a vanilla build of the last version
- known to work performs fine as well.
-
- * Send a short problem report to the Linux stable mailing list
- (stable@xxxxxxxxxxxxxxx) and CC the Linux regressions mailing list
- (regressions@xxxxxxxxxxxxxxx); if you suspect the cause in a particular
- subsystem, CC its maintainer and its mailing list. Roughly describe the
- issue and ideally explain how to reproduce it. Mention the first version
- that shows the problem and the last version that's working fine. Then
- wait for further instructions.
-
-The reference section below explains each of these steps in more detail.
-
-
-Reporting issues only occurring in older kernel version lines
--------------------------------------------------------------
-
-This subsection is for you, if you tried the latest mainline kernel as outlined
-above, but failed to reproduce your issue there; at the same time you want to
-see the issue fixed in a still supported stable or longterm series or vendor
-kernels regularly rebased on those. If that the case, follow these steps:
-
- * Prepare yourself for the possibility that going through the next few steps
- might not get the issue solved in older releases: the fix might be too big
- or risky to get backported there.
-
- * Perform the first three steps in the section "Dealing with regressions
- within a stable and longterm kernel line" above.
-
- * Search the Linux kernel version control system for the change that fixed
- the issue in mainline, as its commit message might tell you if the fix is
- scheduled for backporting already. If you don't find anything that way,
- search the appropriate mailing lists for posts that discuss such an issue
- or peer-review possible fixes; then check the discussions if the fix was
- deemed unsuitable for backporting. If backporting was not considered at
- all, join the newest discussion, asking if it's in the cards.
-
- * One of the former steps should lead to a solution. If that doesn't work
- out, ask the maintainers for the subsystem that seems to be causing the
- issue for advice; CC the mailing list for the particular subsystem as well
- as the stable mailing list.
-
-The reference section below explains each of these steps in more detail.
+The detailed step-by-step guide on reporting Linux kernel issues
+================================================================
+
+The short guide above might be all needed for people already familiar
+with reporting issues to Free/Libre & Open Source Software projects. For
+everyone else there is this more detailed step-by-step guide. It still tries to
+be brief and leaves a lot of details occasionally relevant to a reference
+section, which holds additional information for almost all of the steps.
+
+Note: this step-by-step guide covers more aspects than the short guide above and
+does things in a slightly different order; that is done in the reader's interest,
+to make sure you notice early on when on the wrong track.
+
+* Be aware you must have or install a fresh vanilla mainline kernel for
+ reporting; you furthermore must remove any software that builds or relies on
+ externally developed kernel modules possibly installed. There is also a decent
+ chance you will have to build a patched kernel yourself to help resolve the
+ issue.
+
+ In case that sounds do demanding to you, better report the issue to the vendor
+ who built your kernel (usually your Linux distributor or hardware manufacturer).
+
+* Skim the output of ``journalctl -k`` for any indicators of problems that might
+ lead to your bug.
+
+* Check if the kernel was already 'tainted' when the issue first occurred: the
+ event that led to this flag being set might cause your issue, even if it looks
+ totally unrelated.
+
+* Consider some glitch in your kernel's environment makes it misbehave -- like
+ a hardware defect, a mis-configured system firmware, an overclocked component,
+ a broken initramfs, an inconsistent file system, broken firmware files,
+ a pre-release compiler, or a malfunctioning/misconfigured Linux distribution.
+
+* If you deal with multiple issues at once, process them separately from now on.
+ If there is even a small chance they are related, briefly mention the other
+ issues in each of the reports later, ideally while linking to the others.
+
+* Search for fixes and earlier reports referring to an issue like yours. Start
+ by checking `lore <https://lore.kernel.org/all/>`_. Then perform a general
+ internet search. Consult :ref:`MAINTAINERS <maintainers>` to determine where
+ developers of the affected code expect bugs to be submitted to; if in a doubt,
+ use your best guess to determine the driver or kernel subsystem. If its
+ developers have a dedicated mailing list not archived on lore, search its
+ archives; when they are among the few that uses one of
+ various bug trackers, search it as well. Note, bugzilla.kernel.org
+ is the right place to file bugs only for a small percentage of the kernel; if
+ you submit bugs for other code there it most likely will be ignored.
+
+ If you find fixes, try them. If you find matching reports, evaluate whatever
+ is wiser: joining the discussion or reporting the problem anew. In the latter
+ case mention and link to the related report you found; after you submit it,
+ add a note to the related report along the lines of 'I have a problem that
+ might be the same or related, for details see <link_to_your_report>'.
+
+* Are you facing a regression? One still occurring with a less than two
+ (ideally: one) weeks old kernel from the affected series? A kernel that is
+ vanilla or close to it? Then send a brief (one or two short paragraphs) email
+ to <regressions@xxxxxxxxxxxxxxx> asking if the problem is known already.
+ Consider proceeding with this guide immediately to confine the problem and
+ report it properly; definitely do so, if you don't receive any helpful
+ answer within three days.
+
+* Evaluate if the issue you are dealing with qualifies as regression, security
+ issue, or a really severe problem: those need special handling in some of the
+ following steps.
+
+* Write down coarsely how to reproduce the issue on a freshly booted system.
+
+* Verify the bug and potentially bisect any regression as described in
+ Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst;
+ alternatively handle the tasks it covers on your own:
+
+ * Verify the bug occurs with an up-to-date kernel. For regressions within a
+ still supported stable or longterm series this means the latest release from
+ that series. In all other cases, this means a mainline release, pre-release, or
+ snapshot ideally less than one week old and two at maximum; the latest release
+ from the newest stable series might work as well, especially if the series
+ is based on a mainline version released in the past two weeks.
+
+ * In case of a regression, consider bisecting it. If it is one within a stable
+ or longterm series, you must verify if current mainline is affected as well.
+
+ * All kernels used for verifying and reporting bugs must be free of externally
+ developed modules (like Nvidia's graphics drivers, OpenZFS, or VirtualBox's
+ host drivers). The kernels also should be built from pristine (aka 'vanilla')
+ Linux sources, but lightly patched might work, too. The kernels furthermore
+ should not be 'tainted' when the issue occurs.
+
+ Note, don't skip this step or take its demands lightheartedly, as there is a
+ decent chance your report otherwise will be ignored or welcomed brusquely.
+
+* If you learned anything new about the bug while following this guide so far,
+ consider searching once more for earlier reports and fixes.
+
+* Were you unable to reproduce a bug with a current mainline kernel you want to
+ see fixed in a stable or longterm series? A bug that is not a regression? Then
+ move over to ‘Resolving non-regressions only occurring in stable or longterm
+ kernels’.
+
+* Optional: if your failure involves a 'panic', 'Oops', 'warning', or 'BUG',
+ ideally decode the included stack trace.
+
+* Prepare the report by writing a detailed description of the issue.
+
+ Always mention the Linux distribution and the kernel version used for the
+ verification; also include your notes on how to reproduce the issue. If your
+ failure involves a 'panic', 'Oops', 'warning', or 'BUG', include a copy or
+ photo of it.
+
+ Most of the time you also want to describe relevant aspects of your
+ environment, like the machine's model name, the relevant hardware components,
+ or the version of related userspace drivers. Often you want to also save the
+ output of ``journalctl -k`` to a file you later attach to your report or
+ upload somewhere and link to.
+
+ If there other aspects about the environment likely are relevant, attach or
+ upload & link detailed information about is as well, like the output from
+ commands as ``lsblk``, ``lspci``, ``lsusb.py`` and
+ ``grep -s '' /sys/class/dmi/id/*``.
+
+ If anything in the attached or linked files is certainly relevant, ensure
+ to copy that part to the body of the report to make it easily accessible.
+ Furthermore make sure to not overload the report with many or huge
+ attachments: developers will ask for additional data when needed.
+
+ Ensure both the subject and the first sentence of the report outlines the core
+ of the problem and gets people interested enough to read on.
+
+ When finished, review and optimize the report once more to make it as
+ straightforward as possible and the core of the problem easy to grasp.
+
+* Submit your report in the appropriate way, which depends on the outcome of the
+ verification:
+
+ * In case you deal with a security issue, follow the instructions in
+ Documentation/process/security-bugs.rst.
+
+ * Are you facing a regression within a stable or longterm kernel series you
+ were unable to reproduce with a fresh mainline kernel? Then report it by
+ email to the stable team while CCing the regressions lists (To:
+ Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>,
+ Sasha Levin <sashal@xxxxxxxxxx>; CC: stable@xxxxxxxxxxxxxxx,
+ regressions@xxxxxxxxxxxxxxx).
+
+ * In all other cases, submit the report as specified in MAINTAINERS. In case
+ of a regression you have to report by mail, CC the regressions list
+ (regressions@xxxxxxxxxxxxxxx); when you know the culprit, also CC everyone
+ in its 'Signed-off-by' chain. In case of a regression you had to file in a
+ bug tracker, write a short heads-up email with a link to the report to the
+ list and everyone that signed the patch off, if the culprit is known.
+
+ Did you send the brief inquiry about a regression mentioned earlier? Then in
+ both of these cases keep it involved: either send your report as a reply to
+ the earlier inquiry while adding relevant recipients or send a quick note
+ with a link to the proper report.
+
+* Wait for reactions and keep the ball rolling until you can accept the outcome
+ in one way or the other. That among others means:
+
+ * React publicly and in a timely manner to any inquiries.
+
+ * Try to quickly test proposed fixes.
+
+ * Perform proactive testing: retest with at least every first release
+ candidate (e.g. -rc1) of a new mainline version and report your findings in
+ a reply to your report.
+
+ * If things stall for more than three or four weeks, check if that happened
+ due to an inadequate report of yours; if not, send a friendly inquiry.
+
+ * Be aware that nobody is obliged to help you, unless it is a recent
+ regression, a security issue, or a really severe problem; hence try to help
+ yourself, if you don't receive any or only unsatisfying help.
+
+Resolving non-regressions only occurring in stable or longterm kernels
+----------------------------------------------------------------------
+
+Are you facing an issue in a still supported stable or longterm series you were
+unable to reproduce with a fresh mainline kernel? An issue that is also not a
+regression and still happens in the series latest release? In that case follow
+these steps:
+
+* Prepare yourself for the possibility that trying to resolve the issue resolved
+ in the affected stable or longterm series might not work out: the fix might be
+ too big or risky to include there.
+
+* Search Linux' mainline Git repository or lore for the change that resolved the
+ issue; when unsuccessful, consider using a bisection to find it. Then check
+ the description of the fix for a 'stable tag', e.g, a line like
+ 'Cc: <stable@xxxxxxxxxxxxxxx>':
+
+ * In case there is such a tag the change is already scheduled for backporting.
+ Usually it will be picked up within two or three weeks after being merged to
+ mainline. Note, a version number after the tag might limit backporting to a
+ series that is newer than the one you care for; plans to backport a change
+ sometimes are also discarded. In such cases search lore or contact the
+ involved developers for details, but you likely are out of luck.
+
+ * If there was no stable tag, search the mailing list archives if backporting
+ nevertheless is in the works. If not, search for the review of the fix and
+ check if backporting to stable and longterm kernels is planned or was
+ rejected. If it's neither, send a reply asking the developers if backporting
+ to the series is an option. Note, they might greenlight it, but unwilling to
+ handle the job themselves -- in that case consider testing and submitting the
+ fix and everything it depends on as explained in
+ Documentation/process/stable-kernel-rules.rst.
+
+ In case you have trouble locating the fix or the discussion about it, consider
+ asking the maintainers and developers of the affected subsystem for advice.


Reference section: Reporting issues to the kernel maintainers
--
2.44.0