Re: [PATCH 0/12] Per-bdi writeback flusher threads v7

From: Jens Axboe
Date: Tue May 26 2009 - 16:47:24 EST


On Tue, May 26 2009, Damien Wyart wrote:
> > > I have been playing with v7 since your sending and after a while
> > > (short on laptop, longer on desktop, a few hours), writeback doesn't
> > > seem to work anymore. Manual call to sync hangs (process in D state)
> > > and Dirty value in meminfo gets growing. As previous versions had
> > > been heavily tested, I guess there is some regression in v7.
>
> > Not good, the prime suspect is the sync notification stuff. I'll take
> > a look and get that fixed. You didn't happen to catch any sysrq-t back
> > traces or anything like that? Would be interesting to see where
> > bdi-default and the bdi-* threads are stuck.
>
> No, as I was doing many things at the same time and not exclusively
> debugging, I just rebooted hard and went back to an upatched kernel when
> the problems occured. But I noticed only bdi-default was alive, the
> other bdi-* threads had disappeared and the sync commands I had tried
> were all in D state. Also I tried to reinstall a kernel .deb (these
> systems are Debian) and this got stuck guring installation, when probing
> grub config (do not know if there is some sync syscall inthere).
>
> Can try to go further tomorrow but will not have a lot of time...

OK, I spotted the problem. If we fallback to the on-stack allocation in
bdi_writeback_all(), then we do the wait for the work completion with
the bdi_lock mutex held. This can deadlock with bdi_forker_task(), so if
we require that to be invoked to make progress (happens if a thread
needs to be restarted), then we have a deadlock on that mutex.

I'll cook up a fix for this, but probably not before the morning.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/