Fw: Locks used in the FAT file system are non-atomic and in fact, don't work on SMP systems

Jeff Merkey (jmerkey@timpanogas.com)
Thu, 26 Aug 1999 11:23:45 -0600


This is a multi-part message in MIME format.

------=_NextPart_000_02EE_01BEEFB5.75E6C690
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Linus,

You are also doing this in locks.h and the functions lock_super() and =
unlock_super(). Am I missing something here? We used this same method, =
and got corrupted data on SMP systems. It is possible for two processes =
to blow up here by entering the function at the same time if the lock =
variable is zero. It's hard to reproduce (we have to perform cyclic =
copies with 8+ processes on a 4 processor system for over two hours to =
reproduce, but there is a hole here if we use these locking primitives =
the way you have defined them in locks.h.

Comments? =20

Please advise.

Jeff

----- Original Message -----=20
From: Jeff Merkey=20
To: linux-kernel@vger.rutgers.edu=20
Sent: Thursday, August 26, 1999 10:44 AM
Subject: Locks used in the FAT file system are non-atomic and in fact, =
don't work on SMP systems

=20
We had attempted to use the FAT version of locks with wait queues, but =
have discovered they are non-atomic and in fact, under very heavy load =
allow shared data corrupton on SMP systems. They also have some subtle =
race conditions even on non-SMP systems i reentrant code. We are using =
atomic semaphores now instead. Just thought we would warn folks that =
what's out there appears to be busted.

The offending code is:

Lock()
{
while (lock) sleep_on(&wait);
lock =3D 1;
}

Unlock()
{
lock =3D 0;
wake_up(&wait);
}

Two processes can enter Lock() while lock is equal to 0, and both set =
it. We have seen this occur, and it seems broken. =20

Jeff

------=_NextPart_000_02EE_01BEEFB5.75E6C690
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

 
Linus,
 
You are also doing this in locks.h and = the=20 functions lock_super() and unlock_super().  Am I missing something=20 here?  We used this same method, and got corrupted data on SMP=20 systems.  It is possible for two processes to blow up here by = entering the=20 function at the same time if the lock variable is zero.  It's hard = to=20 reproduce (we have to perform cyclic copies with 8+ processes on a 4 = processor=20 system for over two hours to reproduce, but there is a hole here if we = use these=20 locking primitives the way you have defined them in = locks.h.
 
Comments? 
 
Please advise.
 
Jeff
 
 
----- Original Message -----=20
From: Jeff=20 Merkey
To: linux-kernel@vger.rutgers.edu =
Sent: Thursday, August 26, 1999 10:44 AM
Subject: Locks used in the FAT file system are non-atomic = and in=20 fact, don't work on SMP systems

   
We had attempted to use the FAT version = of locks=20 with wait queues, but have discovered they are non-atomic and in fact, = under=20 very heavy load allow shared data corrupton on SMP systems.  They = also have=20 some subtle race conditions even on non-SMP systems i reentrant = code.  We=20 are using atomic semaphores now instead.  Just thought we would = warn folks=20 that what's out there appears to be busted.
 
The offending code is:
 
Lock()
{
   while (lock)=20 sleep_on(&wait);
   lock =3D 1;
}
 
Unlock()
{
   lock =3D 0;
   = wake_up(&wait);
}
 
Two processes can enter Lock() while = lock is equal=20 to 0, and both set it.  We have seen this occur, and it seems = broken. =20
 
Jeff
 
 
------=_NextPart_000_02EE_01BEEFB5.75E6C690-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/