Re: NFS deadlock

From: Jaco Kroon
Date: Thu May 20 2004 - 03:11:45 EST


Oh, sorry.

The one box at home is running 2.6.5 currently with the intent of upgrading to 2.6.6 as soon as I can find the time. It is using an ext3 file system underlying. The same goes to the single client it serves to.

The one at the office that serves up to a hundred or so clients currently runs 2.6.4 with a patch for the dpt_i2o driver, it has an ext3 partition but the one being served up via nfs is reiserfs. The clients are running 2.6.5 at the moment (anything between 0 and 320 clients max at any time, usually between 20 and 50 clients depending on lab usage). The other server is using 2.4.24 (waiting for the dpt_i2o driver in the 2.6 kernel) with ext3 file systems and once again reiserfs for the nfs exported part of the file system. Variety of clients in this case, from 2.4.20 kernels, right through to 2.6.6 kernels.

The machine that died yesterday is also running a 2.6.5 kernel, ext3 file system. It's two clients is the first of the two servers above and the other runs kernel 2.6.6 as well.

What affects the regulularity of the crashes seems to be the load placed on it by clients. In my case at home the client is considerably faster that the server, which will enforce a relatively high load. I wish I had more time to check this out. I'm suspecting some kind of race condition that gets triggered by either heavy system load or a heavy skew between speeds on the client/server. I might be totally wrong though ...

Transfers in our case is always between linus and linux (at least as far as we can control it, we are not aware of any other clients and would probably manage to get such a person expelled should we find him).

The client lock-ups we've experienced as well. It eventually times out after a *long* time, we usually bounce the server before that happens. This can be explained and is in my oppinion quite normal.

What does sysstat and sar do? How can I use them to analyse the problem?

Jaco

samg@xxxxxxxxxxxxx wrote:

Jaco,

How are your boxes locking up, I have nfs in use every day,
does rpc die?

what kernel are you using?
and are you transfering linux to linux, or to some other platform.

The only time I had problems was when my client locked up
because I disconnected the server, and it hung the client,
the only solution (based on the way I connected), was to reboot.
To make matters worse, I rean a script that used du every day, and
so there were 12+ instances of du, all trying to run about.

I would suggest using a program like sysstat, or sar, to help you
analyse the issues at hand.

-sam



Hello there

I've once again got problems with the kernel locking up. I'm now
convinced that it has something to do with NFS.

Previously weve had 2 machines that locked up, plus my one at home,
resulting in three machines. Sometimes they would recover by themselves
after some time, other times they could be left for 2 days or so without
recovering. All three of these use NFS to export files to other
machines, it's the only thing we can find they have in common, other
that x86 architecture, but then other machines would be dying as well.
It should be noted that none of these runs on the newest hardware, but
that should not matter, neither does any of our other servers. We have
a 3rd NFS server, which doesn't take nearly as heavy load via NFS. I've
been wondering why it hasn't locked up either, and this morning (right
now in fact) it has decided that it is it's turn and is currently
unusable.

If anybody else is experiencing similar problems, or have possible work
arounds, it would be appreciated if you could share your knowledge.

Jaco

===========================================
This message and attachments are subject to a disclaimer. Please refer to
www.it.up.ac.za/documentation/governance/disclaimer/ for full details.
Hierdie boodskap en aanhangsels is aan 'n vrywaringsklousule onderhewig.
Volledige besonderhede is by
www.it.up.ac.za/documentation/governance/disclaimer/ beskikbaar.
===========================================





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



--
"The strength of the Constitution lies entirely in the determination of each
citizen to defend it. Only if every single citizen feels duty bound to do
his share in this defense are the constitutional rights secure."
-- Albert Einstein
===========================================
This message and attachments are subject to a disclaimer. Please refer to www.it.up.ac.za/documentation/governance/disclaimer/ for full details.
Hierdie boodskap en aanhangsels is aan 'n vrywaringsklousule onderhewig. Volledige besonderhede is by www.it.up.ac.za/documentation/governance/disclaimer/ beskikbaar.
===========================================

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature