Repeatable File Corruption (ECS K7S5A w/SIS735)

From: dan (lung@theuw.net)
Date: Thu Oct 25 2001 - 21:21:47 EST


===== SUMMARY

I have found a file corruption issue when writing files seemingly related
to my new motherboard.

===== HARDWARE

  It is repeatable and verified on other boards of the same model. This
just started happening when I upgraded the system. The following is a
link to the ECS K7S5A board in question, the SIS735 chipset, and a
hardware description:

  http://www.ecsusa.com/ecsusa/www.ecs.com.tw/products/k7s5a.htm
  http://www.sis.com/products/chipsets/oa/socketa/735.htm

  ECS K7S5A motherboard
  AMD 1.4Ghz Tbird
  2x256Mb Micron DDR PC2100
  hda: ST36421A, ATA DISK drive
  hdb: QUANTUM FIREBALLP LM10.2, ATA DISK drive
  hdc: IC35L060AVER07-0, ATA DISK drive

===== DESCRIPTION

The bug was first noticed when running a game server that verifies client
map files with the server map files. Some of the maps, no matter what I
did, were always different, causing the server to reject clients. Using
the commands md5sum and cmp, I noticed that each time I copied a new file
into place, it was slightly different:

  $ cp de_aztec.bsp de_aztec.bsp2
  $ cmp de_aztec.bsp de_aztec.bsp2
  de_aztec.bsp de_aztec.bsp2 differ: char 2287174, line 6590
  $ md5sum de_aztec.bsp de_aztec.bsp2
  c138a969edf84abcfc5fe4b4b2a2c5ed de_aztec.bsp
  36b5bdd244d69a8d0fb3c81ba25e1df1 de_aztec.bsp2

Not only that, every copy is different from every other copy:

  $ cp de_aztec.bsp de_aztec.bsp3
  $ cp de_aztec.bsp de_aztec.bsp4
  $ cp de_aztec.bsp de_aztec.bsp5
  $ md5sum de_aztec.bsp de_aztec.bsp?
  c138a969edf84abcfc5fe4b4b2a2c5ed de_aztec.bsp
  fe858630be18e37eba8ea73a6be941c9 de_aztec.bsp2
  87bd1ad8b0ff2c58ebff5372f865fa35 de_aztec.bsp3
  7cffbffca0600c80e6cae336f52c885b de_aztec.bsp4
  91287ef7c8eb047d8f305315b4e0ce55 de_aztec.bsp5

It does not seem to be an issue with reading from the drive because the
md5sum of the original file is always the same.

This is repeatable with any command that will write a file out including
tar, rsync, and ftp. If I move the file within the partition, there is no
corruption. Copying the file one character at a time works.

This does not happen with every file on the drive. In fact, out of the
1400 files related to the server, I only found a few dozen that I could
repeat this issue with. I also have complaints from other users of the
system of files being corrupt and have found a number myself.

I have repeated this with the following:

  Duplicate systems (I have 5)
  Multiple hard drives (IBM, Quantum, Seagate)
  Different filesystems on multiple drives (reiserfs, ext2)
  With and without different drivers in the kernel (IDE chipset support)
  Kernels 2.4.10 through 2.4.13
  Drives running at different speeds (ATA-100, 66, 33)
  Slow processor (AMD 1.2GHz Tbird)

===== WORKAROUND

The problem only went away when I replaced the motherboard. I also
haven't had any file corruption issues running Windows2000 on the same
hardware with the same files. I moved all of the hardware in the original
system to a new motherboard (ASUS A7A266) and the problem went away.

I have CC'd the IDE chipset maintainer because I can only assume it might
be related.

Thank you for fighting through this email, I know it was long and boring.

dan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Oct 31 2001 - 21:00:29 EST