Re: VFS?VM - mmap/write deadlock

Ben LaHaise (bcrl@redhat.com)
Tue, 21 Dec 1999 00:33:26 -0500 (EST)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Jim Nance: "Bug in write() in 2.3.33 w/ testcase"
Previous message: David Parsons: "Re: Possible workaround for buggy E801 call in 2.2.x"

On 20 Dec 1999, Linus Torvalds wrote:

> In article <14430.57171.555030.958940@dukat.scot.redhat.com>,
> Stephen C. Tweedie <sct@redhat.com> wrote:
> >
> >I've just been talking with Ben LaHaise about this, and a complete fix
> >really will need to ensure that the semaphore taken out by write_page()
> >against truncates happening during the write-back _has_ to be compatible
> >with any locks taken inside the generic write path.
>
> Quite frankly, any fix for this is probably going to be more unstable
> than just waiting for 2.4.x which has all of this fixed by virtue of
> just doing the memory mapping right. I don't think it's very productive
> playing with the innards of 2.2.x that much: the whole point of 2.2.x is
> to NOT do major changes, and fixing the deadlock on inode->i_sem _is_ a
> major change.

We are talking about 2.4 -- right now one can deadlock by write()ing from
an mmap()'d into the same file. The granularity of the lock is much
smaller (it's the page lock), but it's still there. What I was talking
about with Stephen is making inode->i_sem a rw semaphore in order to go
about protecting inode->i_size from changing (eg due to a truncate) while
accessing the file (ie protect it on reads as well to prevent false
mappings of pages lingering in the page cache after a truncate), while at
the same time allowing simulatenous writers to a given file.

These are two separate issues. For the deadlock case, I think we could do
the following -- add a current->struct page *write_locked_page that is set
to whatever page we've currently got locked for writing. If we take a
page fault and end up in filemap_nopage, we compare if the page we want to
lock for writing is the same as current->write_locked_page. If it is,
boom SIGBUS, deadlock caught.

As for the i_size protection, the idea is to have both read() and write()
take i_trunc_sem for read, truncate takes it for write. This prevents
buffers from being unmapped out from under us, as well as read()/write()
putting fresh buffers in under truncate. In addition, we need a separate
lock to serialize writes extending the file (for O_APPEND). The question
was how to get these locks correct so as to prevent shooting ourselves in
the foot when trying to fix the deadlock, while at the same time giving us
the most parallelism possible from the reworked page cache.

Is this sounding at all reasonable?

-ben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Jim Nance: "Bug in write() in 2.3.33 w/ testcase"
Previous message: David Parsons: "Re: Possible workaround for buggy E801 call in 2.2.x"