Re: 2.4.35 SMP: ext3_readdir: bad entry in directory #323888: rec_len is smaller than minimal

From: Willy Tarreau
Date: Thu Sep 20 2007 - 09:21:29 EST


Hi Ville,

On Thu, Sep 20, 2007 at 03:45:37PM +0300, Ville Herva wrote:
> On Tue, Sep 18, 2007 at 11:47:05PM +0200, you [Willy Tarreau] wrote:
> > Thanks for your report. Unfortunately, I've rechecked the recent changelogs
> > and see nothing related either. At least, in order to keep trace of the
> > incident, would you please post some info about your config (CPU, RAM,
> > chipset, .config, gcc, and any possible patches you may have applied) ?
> > Maybe some of these info may remind old bad memories to some people.
> >
> > Also, do you know if this server has ECC memory ? I would more easily
> > bet for side effects of one random bit flip in memory than for some
> > massive block corruption.
> >
> > I vaguely remember about very old reports of people sometimes observing
> > zeroed out blocks during writes, which were attributed to chipset bugs
> > if my memory serves me. But I would rule this out as recent chipsets
> > look more stable than 5-10 years ago !
>
> Willy,
>
> The machine is a virtual machine on an VMware ESX 3.0.1 host.
>
> /proc/cpuinfo shows two of these:
> Dual
> model : 15
> model name : Intel(R) Xeon(R) CPU E5345 @ 2.33GHz
> stepping : 8
> cpu MHz : 2333.014
> cache size : 64 KB
>
> It has 864MB of memory.
>
> .config is at:
> http://v.iki.fi/~vherva/tmp/2.4.35-config
> The kernel is plain vanilla 2.4.35 from kernel.org, no patches.

OK. And your config seems perfectly standard.

> gcc 2.96-129:
> cat /proc/version
> Linux version 2.4.35 (root) (gcc version 2.96 20000731 (Red Hat Linux 7.2 2.96-129.7.2)) #1 SMP Thu Aug 9 10:35:37 EEST 2007

I used not to trust 2.96, but I wouldn't accuse it now.

> Memory is ECC.
>
> The server is HP Proliant ML370 with 82801BA/CA/DB/EB chipset. I've had my
> share of chipset bugs with older Via chipsets, but I think it's very likely
> in this case.

I think you meant "unlikely".

> This could very well be a VMware bug, but I wanted to know if this rings
> bells for someone.

It could also be a problem with the host OS, drivers, hardware, etc...

Cheers,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/