linux v2.2.15pre13 filesystem/kernel race and klog printk data loss.

From: M Sweger (mikesw@whiterose.net)
Date: Thu Mar 09 2000 - 16:45:15 EST


Hi,
    I've compiled and installed linux v2.2.15pre13 using egcs v1.0.3
and libc6 v2.0.7 with make v3.77. I have a Dell optiplex 333mhz
with 128megs and a 2940U2W SCSI card with AIC7895 and a 9.1Gig
Western digital disk drive. I have applied all the aic7xxx patches
up to v5.1.28.

     Here are the following problems I'm encountering
on a UMSDOS filesystem. I don't know if the ext2 filesystem will have the
same problem. BTW, this occurs whether SMP, raid0-5, and dquota is
selected or not, although I only have a uniprocessor system.

PROBLEM #1:
       a). when I do a "ls -l /DOS" I get an "Umsdos: Oops releasing
            pseudo root, Notify .... " message due
            to the pseudo root inode equaling the real root. If "ls -d /DOS"
            or "ls -lt" is done in "/", there is no Oops since it
            doesn't look inside the /DOS subdirectory.

            This wasn't a problem in linux 2.2.1 kernels.
            Kernels 2.2.2-12 I don't know about since I had problems running
            these.
            Kernels 2.2.13, 2.2.14, 2.2.15pre3/pre5/pre7/pre9/pre10/pre13 have
            this Oops problem.

            I'm not sure if ext2 which may have a /DOS on it, if it has
            this problem too. Can somebody verify this?

            Partial Solution:
                To track this down, I started putting printk(KERN_INFO...
                messages into every UMSDOS function call upon entry into
                the function. When I ran this, the Oops disappeared!!!!
                Thus there is a timing problem somewhere in the filesystem
                or VFS filesystem/kernel. There has been very few
                changes to the UMSDOS filesystem since kernel 2.2.1, thus
                the UMSDOS filesystem shouldn't be at fault.

                I tried narrowing down the search by reducing the number
                of printk(KERN_INFO.... in each file, but it's
                indeterminate as to which one eliminates the problem.

        b). Another problem that hampers finding item (a) above, is
             that when printk's try to record via klogd and syslog.conf
             to /var/log/kernel, not all printk's are recorded. Some
             printk's will have a partial message written or a
             printk will overwrite the previous printk in the middle
             of the first printk message.

             Thus, there seems to be a race condition whereby the message
             isn't completely "flushed" to the output file before the
             next one comes in. If klogd isn't started and there is
             alot of printk's being queued up, waiting to be output, the
             kernel will hang. This is true on bootup if tons of printk's
             are output for recording vs. just a few. If klogd is started,
             and there are alot of printk statements in the code for
             tracing purposes to see what the code is doing, the various
             printks may not even be recorded at all. This was the
             case when I put alot of printk's in the dcache.c file and at
             the same time in the functions for the UMSDOS filesystem
             recording which functions were entered, and when files where
             unlinked or rmdir'd.

             Somebody suggested that I send klogd output to another
             Unix machine which is stable and thus there wouldn't be
             any race condition to the file on the disk - assuming
             the kernel itself didn't lose it. Although this is
             a good idea, the SA's I work with won't let me redirect
             the 500k output file to the Sun machines to verify this
             assumption. :(

             I tried the following to bypass syslogd but it only
             made it worse. :( :(
                        Does a system call instead of reading from /proc/kmsg
                        and writes directly to the file bypassing syslogd.
                klogd -p -s -f /var/log/kernel
                        Doesn't do a system call but writes /proc/kmsg
                        directly to a file bypassing syslogd.
                klogd -p -s
                        Does a system call instead of reading /proc/kmsg
                        and write via syslogd. This records a little more
                        but "pg kernel" thinks it's a binary file???
                        This uses syslog.
                klogd -p -f /var/log/kernel
                        No expected change although this wasn't tried.
                klogd -p
                        This one seems to write the MOST information but
                        still drops alot and uses syslog

                NOTE: my syslog.conf has this in it

                kern.* /var/log/kernel

PROBLEM #2:

       c). Another race condition occurs (and which item(b) applies),
             is when the "make mrproper" is run after doing a kernel build.
             This is the only place it has occurred reliably for me and
             where I have noticed the problem. The problem is that
             I get a "Device or resource Busy" error message due to -EBUSY being
             returned in file namei.c function UMSDOS_rmdir. The
             "make mrproper" gets to "rm -rf include/config" and then
             gives the following errors,

rm: cannot remove directory `include/config/bsd/process': Device or resource busy
rm: cannot remove directory `include/config/bsd': Device or resource busy
rm: cannot remove directory `include/config/blk/cpq': Device or resource busy
rm: cannot remove directory `include/config/blk': Device or resource busy
rm: cannot remove directory `include/config/paride': Device or resource busy
rm: cannot remove directory `include/config/chr/dev': Device or resource busy
rm: cannot remove directory `include/config/chr': Device or resource busy
rm: cannot remove directory `include/config/scsi/generic': Device or resource busy
rm: cannot remove directory `include/config/scsi': Device or resource busy
rm: cannot remove directory `include/config/ms': Device or resource busy
rm: cannot remove directory `include/config/vga': Device or resource busy
rm: cannot remove directory `include/config/mda': Device or resource busy
rm: cannot remove directory `include/config/sound': Device or resource busy
rm: cannot remove directory `include/config': Device or resource busy
make: *** [mrproper] Error 1

                Now if "make mrproper" is run a second time, there is no
                problem. There is no one in any of the subdirectories
                which may cause this too. If upon the first "make mrproper"
                one goes to these directories to see what's in them, they
                are all empty since the "rm" statement removed the files.
                Somehow, the dentry tree inodes *aren't* equal.

                Since the function "list_empty()" is returning FALSE,
                the above -EBUSY is returned. This would then point
                to the dentry in /usr/src/linux/fs in the file
                dcache.c. This code has changed alot and I'm running
                into the same problem as identified in item (b) -
                missing printks when they are inserted here.

                If "rm -rf include/config" is run manually, there is
                no problem. However, when run within "make" using
                "make mrproper" it gives these errors. Now if the
                Makefile is modified so that only this one "rm" is
                done and the rest commented out, it still has this
                error. Thus, the other rm's aren't influencing it
                or causing a dependency.

                When I try to specify a printk when "pruning" is
                occuring. I do see that dentry prunes seem to be
                occuring before the actual unlink of the directory:
                after that particular subdirectory has been emptied out.
                But, in some cases the pruning may be delayed till later
                and they may prune a couple of subdirs all at once.
                I'm not sure under what circumstance one or the other
                is selected - the immediate one or the delayed one.
                I've also noticed that some subdirs don't get pruned
                nor unlinked although they are deleted from the filesystem.
                I attribute this to skipping/losage of printk's being
                output to the /var/log/kernel due to klogd being slow
                or the kernel losing track of these messages, particularily
                since the UMSDOS filesystem executes the same path for
                all files/directories being removed.

QUESTION #1:
        I've noticed that when files get unlinked in a directory the
        directory the file is in is locked then unlocked; however,
        when it comes time to rmdir the subdirectory name (after being
        emptied) the parent diretory isn't locked/unlocked. Should it be???

UMSDOS filesystem maintainer QUESTION here,

        Another, possible problem is that UMSDOS uses the EMD file
        which is named --linux-.---. When it comes time to remove the
        subdir, it also removes this file. However, the subdir isn't
        locked whils't the EMD file is being removed, then the subdir
        is unlocked after the parent has been locked so that the subdir
        name can then be removed. This should prevent other processes
        from putting files into the subdir while at the same time
        the EMD file and subdir name is being removed. Possible
        filesystem corruption problem here I think!!!

        When symbolic/hardlinks are performed, then a semphore lock
        using the function down() / up() is used instead of directory
        lock/unlock. Is this the only scenario this is used for????
        It's unclear as to when semaphores or createlock should be used....
        Perhaps it's because a file/directory can be linked whils't
        at the same time another process it trying to remove the file/subdir
        and thus one needs two different forms of locking to tell them
        apart. I assume the order above can be reversed and thus the
        locking method(s) - whether by semaphore or createlock - are
        considered equal in the way they are used so that one doesn't
        have precedence over the other.

        The up() and down() semaphore locks are assembly code and I
        can't tell what's listening to the semaphore flag.

        The sleep_on() and wake_up() semaphore locks too. There is
        another one called wake_up_interruptible() which the filesystem
        doesn't use. When to use this one properly vs. the other?
         The same applies to interruptible_sleep_on() too.
        I seem to not be able to determine if this is assembly code also.

        I did fine ./kernel/sched.c which contains these functions. I
        assume that when called they loop till they are done before
        returning so that the calling routine (in the case the filesystem)
        doesn't have to loop checking a returned flag!

        So what is the basic difference between the two?

        Anyway I'm out of my league at those lower levels.

QUESTION #2:
        Are there any tricks/techniques to isolate the above problems
that may be faster or easier to do?

Note: the UMSDOS maintainer has been notified of the above problems
      but there is nothing he can do, since it all seems to be pointing
      towards the VFS/kernel code IMHO.

Thanks for any help :)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Mar 15 2000 - 21:00:16 EST