2.6.10 - VFS is out of sync with lock manager!

From: Anders Saaby
Date: Wed Jan 12 2005 - 10:25:54 EST


Hi Trond,

Yesterday i posted this to LKML (but my mailreader theated me, and didn't keep
thread info):

(I am very sorry if you have already seen my previous mail - I don't want to
bother you unnessary!)

->

I have seen the exact same error on one of my webservers which is serving
from an NFS export and under heavy load. ~2 hours uptime before panic'ing.
I then tried Trond's patch which seems to work. 14 hours of uptime now. :)

Anyways, I have a couple of issues you might be able to clear up for me:

First issue:
New strange message in the kernel log:

"nlmclnt_lock: VFS is out of sync with lock manager!"

- What does this mean? - Is it bad?, What can i do?


Second issue:
my fs/nfs/file.c doesn't look like yours (Vanilla 2.6.10):

<fs/nfs/file.c SNIP>
        status = NFS_PROTO(inode)->lock(filp, cmd, fl);
        /* If we were signalled we still need to ensure that
         * we clean up any state on the server. We therefore
         * record the lock call as having succeeded in order to
         * ensure that locks_remove_posix() cleans it out when
         * the process exits.
         */
        if (status == -EINTR || status == -ERESTARTSYS)
                posix_lock_file_wait(filp, fl);
        unlock_kernel();
        if (status < 0)
                return status;
        /*
         * Make sure we clear the cache whenever we try to get the lock.
         * This makes locking act as a cache coherency point.
         */
        filemap_fdatawrite(filp->f_mapping);
        down(&inode->i_sem);
        nfs_wb_all(inode);      /* we may have slept */
        up(&inode->i_sem);
        filemap_fdatawait(filp->f_mapping);
        nfs_zap_caches(inode);
        return 0;
</SNIP>

So... Am I missing another patch or something else?

Jan-Frode Myklebust wrote:

> On Wed, Jan 05, 2005 at 10:54:03PM +0100, Trond Myklebust wrote:
>>
>> Looking at the NFS code, I can attempt a wild guess about what may be
>> happening: there may be a race when pressing ^C in the middle of a
>> blocking NFS lock RPC call, and if so, the following patch will fix it.
>
>
> A whopping 9 hours of uptime now :) So the one-liner patch seems to have
> fixed it.
>
> Thanks!
>
>> -   posix_lock_file(filp, fl);
>> +   posix_lock_file_wait(filp, fl);
>
>
>   -jf

--
Med venlig hilsen - Best regards - Meilleures salutations

Anders Saaby
Systems Engineer
------------------------------------------------
Cohaesio A/S - Maglebjergvej 5D - DK-2800 Lyngby
Phone: +45 45 880 888 - Fax: +45 45 880 777
Mail: as@xxxxxxxxxxxx - http://www.cohaesio.com
------------------------------------------------
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/