Re: Kernel deadlock using nbd over acenic driver.

From: Oliver Xymoron (oxymoron@waste.org)
Date: Thu May 16 2002 - 11:22:13 EST


On Thu, 16 May 2002, Peter T. Breuer wrote:

> "A month of sundays ago Oliver Xymoron wrote:"
> > On Thu, 16 May 2002, Peter T. Breuer wrote:
> > > "Oliver Xymoron wrote:"
> > > > If the system runs out of memory, it may try to flush pages that are
> > > > queued to your NBD device. That will try to allocate more memory for
> > > > sending packets, which will fail, meaning the VM can never make progress
> > > > freeing pages. Now your box is dead.
> > > The system can avoid this by
> > >
> > > a) not flushing sync (i.e. giving up on pages that won't flush immediately)
> > > b) being nondeterministic about it .. not always retrying the same
> > > thing again and again.
> >
> > Helpful but insufficient. What is to stop getting into a situation where
> > _all_ memory that is pageable is only pageable via network? Even if you
>
> OK, I agree. The socket (or the process using the socket) needs a
> reserve of memory that it can call upon in order to complete each
> individual network send, and that the rest of the system cannot touch.
>
> > have a big box, if you do large streaming writes to about 20 NBD devices,
> > you'll discover that each device queue can hold many megabytes of dirty
> > data.. Try pulling out your ethernet cable for a moment and watch the
> > thing strangle itself.
>
> Any way of making sure that send_msg on the socket can always get the
> (known a priori) buffers it needs?

Not at present. Note that we also need reservations on the receive side
for ACK handling which is "interesting".

> OTOH, if there is even a single other thread anywhere holding pages
> that we can reclaim, then we can find them by using an async stochastic
> algorithm inside the VM, instead of the current sync, deterministic one,
> surely!

Yes - falling over at the first hard to push page is bad.

-- 
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu May 23 2002 - 22:00:12 EST