Re: autofs: add autofs_parse_fd()

From: Anders Roxell
Date: Mon Oct 23 2023 - 09:57:59 EST


On Mon, 23 Oct 2023 at 09:35, Ian Kent <raven@xxxxxxxxxx> wrote:
>
> On 23/10/23 08:48, Ian Kent wrote:
> > On 20/10/23 21:09, Ian Kent wrote:
> >> On 20/10/23 19:23, Arnd Bergmann wrote:
> >>> On Fri, Oct 20, 2023, at 12:45, Dan Carpenter wrote:
> >>>> On Fri, Oct 20, 2023 at 11:55:57AM +0200, Anders Roxell wrote:
> >>>>> On Fri, 20 Oct 2023 at 08:37, Arnd Bergmann <arnd@xxxxxxxx> wrote:
> >>>>>> On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
> >>>>>>> The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit
> >>>>>>> rootfs we call
> >>>>>>> it as compat mode boot testing. Recently it started to failed to
> >>>>>>> get login
> >>>>>>> prompt.
> >>>>>>>
> >>>>>>> We have not seen any kernel crash logs.
> >>>>>>>
> >>>>>>> Anders, bisection is pointing to first bad commit,
> >>>>>>> 546694b8f658 autofs: add autofs_parse_fd()
> >>>>>>>
> >>>>>>> Reported-by: Linux Kernel Functional Testing <lkft@xxxxxxxxxx>
> >>>>>>> Reported-by: Anders Roxell <anders.roxell@xxxxxxxxxx>
> >>>>>> I tried to find something in that commit that would be different
> >>>>>> in compat mode, but don't see anything at all -- this appears
> >>>>>> to be just a simple refactoring of the code, unlike the commits
> >>>>>> that immediately follow it and that do change the mount
> >>>>>> interface.
> >>>>>>
> >>>>>> Unfortunately this makes it impossible to just revert the commit
> >>>>>> on top of linux-next. Can you double-check your bisection by
> >>>>>> testing 546694b8f658 and the commit before it again?
> >>>>> I tried these two patches again:
> >>>>> 546694b8f658 ("autofs: add autofs_parse_fd()") - doesn't boot
> >>>>> bc69fdde0ae1 ("autofs: refactor autofs_prepare_pipe()") - boots
> >>>>>
> >>>> One difference that I notice between those two patches is that we no
> >>>> long call autofs_prepare_pipe(). We just call autofs_check_pipe().
> >>> Indeed, so some of the f_flags end up being different. I assumed
> >>> this was done intentionally, but it might be worth checking if
> >>> the patch below makes any difference when the flags get put
> >>> back the way they were. This is probably not the correct fix, but
> >>> may help figure out what is going on. It should apply to anything
> >>> from 546694b8f658 ("autofs: add autofs_parse_fd()") to the current
> >>> linux-next:
> >>>
> >>> --- a/fs/autofs/inode.c
> >>> +++ b/fs/autofs/inode.c
> >>> @@ -358,6 +358,11 @@ static int autofs_fill_super(struct super_block
> >>> *s, struct fs_context *fc)
> >>> pr_debug("pipe fd = %d, pgrp = %u\n",
> >>> sbi->pipefd, pid_nr(sbi->oz_pgrp));
> >>> + /* We want a packet pipe */
> >>> + sbi->pipe->f_flags |= O_DIRECT;
> >>> + /* We don't expect -EAGAIN */
> >>> + sbi->pipe->f_flags &= ~O_NONBLOCK;
> >>> +
> >>
> >>
> >> That makes sense, we do want a packet pipe and that does also mean
> >>
> >> we don't want a non-blocking pipe, it will be interesting to see
> >>
> >> if that makes a difference. It's been a long time since Linus
> >>
> >> implemented that packet pipe and I can't remember now what the
> >>
> >> case was that lead to it.
> >
> > After thinking about this over the weekend I'm pretty sure my mistake
> >
> > is dropping the call to autofs_prepare_pipe() without adding the tail
> >
> > end of it into autofs_parse_fd().
> >
> >
> > To explain a bit of history which I'll include in the fix description.
> >
> > During autofs v5 development I decided to stay with the existing usage
> >
> > instead of changing to a packed structure for autofs <=> user space
> >
> > communications which turned out to be a mistake on my part.
> >
> >
> > Problems arose and they were fixed by allowing for the 64 bit to 32 bit
> >
> > size difference in the automount(8) code.
> >
> >
> > Along the way systemd started to use autofs and eventually encountered
> >
> > this problem too. systemd refused to compensate for the length difference
> >
> > insisting it be fixed in the kernel. Fortunately Linus implemented the
> >
> > packetized pipe which resolved the problem in a straight forward and
> >
> > simple way.
> >
> >
> > So I pretty sure that the cause of the problem is the inadvertent
> > dropping
> >
> > of the flags setting in autofs_fill_super() that Arnd spotted although I
> >
> > don't think putting it in autofs_fill_super() is the right thing to do.
> >
> >
> > I'll produce a patch today which includes most of this explanation for
> >
> > future travelers ...
>
> So I have a patch.
>
>
> I'm of two minds whether to try and use the instructions to reproduce this
>
> or not because of experiences I have had with other similar testing
> automation
>
> systems that claim to provide a reproducer and end up a huge waste of
> time and
>
> are significantly frustrating.
>
>
> Can someone please perform a test for me once I provide the patch?

Just tested it, and it passed our tests. Added a tested by flag in your patch.

Thanks for the prompt fix.

Cheers,
Anders





>
>
> Ian
>
> >
> >
> >>
> >>
> >> Ian
> >>
> >>> sbi->flags &= ~AUTOFS_SBI_CATATONIC;
> >>> /*