Re: Error: DMA: Out of SW-IOMMU space [was: External USB drives become unresponsive after few hours.]

From: Konrad Rzeszutek Wilk
Date: Mon Apr 20 2015 - 09:03:38 EST


On Sun, Apr 19, 2015 at 05:43:18PM +0200, Dorian Gray wrote:
> I think the case is closed.
> Now that I know it's not USB, but wireless driver, I looked through
> the new k3.19.5's changelog and saw this:
>
>
> commit b943e69d33fac1e5f6db57868e061096b0aae67a
> Author: Larry Finger <Larry.Finger@xxxxxxxxxxxx>
> Date: Sat Mar 21 15:16:05 2015 -0500
>
> rtlwifi: Fix IOMMU mapping leak in AP mode
>
> commit be0b5e635883678bfbc695889772fed545f3427d upstream.
>
> Transmission of an AP beacon does not call the TX interrupt service routine,
> which usually does the cleanup. Instead, cleanup is handled in a tasklet
> completion routine. Unfortunately, this routine has a serious bug
> in that it does
> not release the DMA mapping before it frees the skb, thus one
> IOMMU mapping is
> leaked for each beacon. The test system failed with no free IOMMU
> mapping slots
> approximately one hour after hostapd was used to start an AP.
>
> This issue was reported and tested at
> https://github.com/lwfinger/rtlwifi_new/issues/30.
>
> Reported-and-tested-by: Kevin Mullican <kevin@xxxxxxxxxxxx>
> Cc: Kevin Mullican <kevin@xxxxxxxxxxxx>
> Signed-off-by: Shao Fu <shaofu@xxxxxxxxxxx>
> Signed-off-by: Larry Finger <Larry.Finger@xxxxxxxxxxxx>
> Signed-off-by: Kalle Valo <kvalo@xxxxxxxxxxxxxx>
> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
>
>
> Looks very related, especially because my wireless card is also always
> in AP mode, however I haven't been actually using it lately, so
> probably that's why I didn't notice anything related to it (and kept
> focused on USB), until I used dump_dma.
>
> Well, due to my minimal knowledge regarding kernel's internals I can't
> be 100% sure that this was it, but so far 3.19.5 is working stable
> (uptime 6hrs and counting).

Sweet!
>
> Thank you Konrad (and everyone else involved) for helping me out to
> pinpoint the actual culprit.

Sure thing. Happy to have been able to help!
> Jake
>
>
> On 18 April 2015 at 21:59, Dorian Gray <yourfavouritegod@xxxxxxxxx> wrote:
> > On 18 April 2015 at 12:10, Dorian Gray <yourfavouritegod@xxxxxxxxx> wrote:
> >> On 17 April 2015 at 22:06, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote:
> >>> On Fri, Apr 17, 2015 at 05:14:20PM +0200, Dorian Gray wrote:
> >>>> On 16 April 2015 at 20:42, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote:
> >>>> > And easier way is to compile the kernel with CONFIG_DMA_API_DEBUG
> >>>> > and then load the attached module.
> >>>> >
> >>>> > That should tell you who and what else is holding on the buffers.
> >>>>
> >>>> Ok, I have compiled 3.19.4 w/ CONFIG_DMA_API_DEBUG=y + the module you sent me.
> >>>> Now, I'm not sure if I've done it right - I waited until the error
> >>>> occured and then modprobe'd dump_dma.
> >>>> I have attached the kernel log, but it tells me not much, if anything...
> >>>
> >>> The network driver is quite hungry for DMA. Did it do the same thing
> >>> in the earlier kernels?
> >>>
> >>> Thanks.
> >>>>
> >>>> Thanks again.
> >>>> Jake
> >>>
> >>>
> >>
> >> Yeah, you're right:
> >>
> >> # grep rtl8192se dump_dma_k3.19.4.log | wc -l
> >> 6789
> >> #
> >> # grep rtl8192se dump_dma_k3.17.8.log | wc -l
> >> 162
> >> #
> >>
> >> So, wlan driver would be the real culprit then..?
> >> I would have never thought...
> >>
> >> I guess I'm gonna test 3.19.4 once more (just to be sure) with
> >> rtl8192se removed and see what happens.
> >>
> >> Thanks!
> >> Jake
> >
> >
> > [update]
> >
> > Ok, 6 hours of uptime (3.19.4 + blacklisted rtl8192se) and everything
> > was fine...
> > However, I was checking periodically and noticed that 'radeon' also
> > tends to grow continuously over time, whereas ethernet driver sticks
> > to, more or less, the same range:
> >
> > # uname -r
> > 3.19.4
> > #
> > # grep -Eo 'radeon|r8169' L1.log | sort | uniq -c
> > 62 r8169
> > 4183 radeon
> > #
> > # grep -Eo 'radeon|r8169' L2.log | sort | uniq -c
> > 33 r8169
> > 5582 radeon
> > #
> > # grep -Eo 'radeon|r8169' L3.log | sort | uniq -c
> > 54 r8169
> > 7007 radeon
> > #
> > # grep -Eo 'radeon|r8169' L4.log | sort | uniq -c
> > 49 r8169
> > 7429 radeon
> > #
> > # grep -Eo 'radeon|r8169' L5.log | sort | uniq -c
> > 34 r8169
> > 9360 radeon
> > #
> >
> > It doesn't grow that much in 3.17.8:
> >
> > # uname -r
> > 3.17.8
> > #
> > # grep -Eo 'radeon|r8169|rtl8192se' L1.log | sort | uniq -c
> > 265 r8169
> > 1229 radeon
> > 142 rtl8192se
> > #
> > # grep -Eo 'radeon|r8169|rtl8192se' L2.log | sort | uniq -c
> > 187 r8169
> > 3159 radeon
> > 124 rtl8192se
> > #
> > # grep -Eo 'radeon|r8169|rtl8192se' L3.log | sort | uniq -c
> > 41 r8169
> > 1894 radeon
> > 39 rtl8192se
> > #
> > # grep -Eo 'radeon|r8169|rtl8192se' L4.log | sort | uniq -c
> > 64 r8169
> > 3370 radeon
> > 77 rtl8192se
> > #
> > # grep -Eo 'radeon|r8169|rtl8192se' L5.log | sort | uniq -c
> > 52 r8169
> > 2597 radeon
> > 49 rtl8192se
> > #
> >
> >
> > Btw, at some point (3.19.4) I encounetered this:
> > [21631.181909] DMA-API: debugging out of memory - disabling
> >
> > Jake
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/