Re: Kernel 3.0: Instant kernel crash when mounting CIFS (alsocrashes with linux-3.1-rc2

From: Jeff Layton
Date: Thu Aug 18 2011 - 14:32:16 EST


On Thu, 18 Aug 2011 12:25:50 -0500
Steve French <smfrench@xxxxxxxxx> wrote:

> On Thu, Aug 18, 2011 at 12:16 PM, Justin Piszcz <jpiszcz@xxxxxxxxxxxxxxx> wrote:
> >
> >
> > On Thu, 18 Aug 2011, Jeff Layton wrote:
> >
> >> On Thu, 18 Aug 2011 09:15:36 -0400 (EDT)
> >> Justin Piszcz <jpiszcz@xxxxxxxxxxxxxxx> wrote:
> >>
> >>>
> >>>
> >>> On Thu, 18 Aug 2011, Jeff Layton wrote:
> >>>
> >>>> On Thu, 18 Aug 2011 08:22:44 -0400 (EDT)
> >>>> Justin Piszcz <jpiszcz@xxxxxxxxxxxxxxx> wrote:
> >>>>
> >>>>> Justin.
> >>>>>
> >>>>
> >>>> To be clear -- incoming in this case is reads or writes?
> >>>
> >>> Reading from the CIFS share (Windows 7).
> >>>
> >>>>
> >>>> Up until 3.0 cifs.ko didn't parallelize writes from a single thread. In
> >>>> 3.0 I added a patchset to increase the allowable wsize and to allow the
> >>>> kernel to issue writes in parallel.
> >>>
> >>> Ahh, good to know, have not tried writes yet.
> >>>
> >>>>
> >>>> Reads still suffer from the same problem however. I'm working on a
> >>>> patchset that should do the same thing for them, but it requires a
> >>>> fairly substantial overhaul of the receive codepaths.
> >>>
> >>> Ok, that explains it then, thanks.
> >
> >
> > Hi,
> >
> > Watching the rsync, it ran for a while, then:
> >
> > rsync: send_files failed to open "/cifs/w1/r1/data/hs12/f4_0.JPG": Cannot
> > allocate memory (12)
> > rsync: send_files failed to open "/cifs/w1/r1/data/hs12/f4_1.JPG": Cannot
> > allocate memory (12)
> > rsync: send_files failed to open "/cifs/w1/r1/data/hs12/f4_2.JPG": Cannot
> > allocate memory (12)
> > rsync: send_files failed to open "/cifs/w1/r1/data/hs12/f4_0.JPG": Cannot
> > allocate memory (12)
>
> When we were testing async write to Windows 7 Pavel mentioned to me
> another WIndows 7 bug - which may be what you are hitting.
>
> Under stress of simultaneous operations, Windows 7 server will sometimes start
> responding with STATUS_INSUFF_SERVER_RESOURCES error code
> (mapped to posix error ENOMEM by the Linux cifs kernel client)
> He solved it by setting MaxWorkItems to 4096 in the Windows 7 registry.
>
> If anyone knows whether Microsoft has fixed this or has a bug #, let us know
> because it is easier to hit with Linux kernel 3.0 and later (to
> Windows 7 server).
>

You may be right, but I'd probably suggest doing a bit of debugging
before assuming that that is the problem. Sniffing with wireshark
should help determine if that is the cause.

Assuming that that is the case...

Since all of these seem to be barfing on open(), I wonder whether you're
hitting some limit on the number of open filehandles? Not sure how we
can reasonably deal with that if so...

On a semi-related note. Steve, how goes the patch to make cifs respect
the MaxMpx advertised by the server? That may be at least part of the
issue here.

--
Jeff Layton <jlayton@xxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/