Re: [PATCH] thread wakeup fix for 2.4.0-test7

From: kuznet@ms2.inr.ac.ru
Date: Mon Aug 28 2000 - 11:32:50 EST


Hello!

> Should we declare read() broken because of (while true; do; done)|grep?

[ Side note. Al, I've said, it is poor demagogy.

  I argue particularly to get some sane arguments _against_ f_users,
  which could assure myself. I am not interested in propaganda,
  I am enough skillfull in it myself. 8) I learned this skill
  because I had to assure people that a bug is their one,
  rather than not mine.
]

So, let's use technical terms yet.

> SCM_RIGHTS cookies. mmap. fork(), for that matter.

Nope. They cannot use fput(), exactly by the same reason why
fork() increases mm_users in addition to mm_count.

> > Also, lingering (i.e. waiting for looong time) in fput() smells
> > interesting. 8)
>
> Yes? Receiving process doesn't care to do recvmsg() on SCM_RIGHTS
> datagram. What should happen on close(), again?

Nothing. close() decreases f_users and exits, because f_users is not zero.

> > > Yes? So how about
> > > fd2 = dup(fd1);
> > > read(fd1,...);
> > > close(fd1);
> > >
> > > Is the file closed in that case? If no - you've got completely weird
> > > semantics (having dup() changes rules for operations on original fd), if
> > > yes - what happens with your arguments about leaks?
> >
> > Dup gets another descriptor and another reference.
>
> So _which_ case should abort read()?

No case. f_users is not zero, nothing happens.

> Alexey, you are mixing files and descriptors. Look: you want to add some
> strange state to the whole thing. Fine. Where should it live? If it's
> associated with file - too bad, dup() will prevent aborting read() by
> close(). If it's on descriptor - WTF does it have struct file?

Wow! You missed the whole problem. 8)

The problem happens only when threads share file table.

thread A thread B

read(fd);
sleeps...
                                close(fd);
read returns -EBADF
(poll returns POLLNVAL), if this file has no more users.

What happens now is race window: if close(fd) happens before
read(fd) grabbed fd or after it fput() it, we get EBADF,
otherwise read() hangs forever. (Actually, even absence of invariance
can be considered as bug.)

Note that problem does not happen with normal files, which
are stateless. It is problem of stateful files: sockets, pipes etc.
They need fops->close() to be called to be closed and it is not
called because read() etc. hold file. Dead loop. And it is evident
how to break this loop.

What's about implementation, it is very easy. Actually code
becomes more clean than it is now. Each time, when you create
new client reference to file increase f_users. Client references
are those references, which mean that file is still alive:
open(), socket(), dup() etc. etc. There are a few of places,
when file is kept alive without any file descriptor f.e. sockets
used by nfs, also mmaped files and SCM_RIGHTS.

When closing a descriptor, you mark it unused and
do cleanups associated to descriptor: locks, flushing signals
(or sending special signal, instead), then:

error = 0;
if (atomic_dec_and_test(&file->f_users)) {
        if (fops->shutdown)
                error = fops->shutdown(file);
}
fput(file)
return error;

where fput() is:

if (atomic_dec_and_test(&file->f_count)) {
        if (fops->close)
                fops->close(file);
        free_filp(file);
}

fops->shutdown is better name for fops->flush().
fops->flush() in its current form seems to be meaningless.

This method is nil for stateless files. For sockets
it refers to shutdown(2) followed by linger, if linger
is enabled. It may block and may return error.

fops->close() does not block, though may sleep.

That's all. fget() does lookup in file table and increases both f_users
and f_count under fd table read_lock. Actually, it is better to name
file_lookup(), but the name is selected.

file_hold(struct file *) does simply atomic_inc(&file->f_count),
it is used when file is already held by caller to get more references.

It is usual scheme, it is used _everywhere_ over the kernel.
Frequently "users" count is implicit, because object is referenced
from fixed places (some hash tables etc.) and scheduled for death,
when it is removed from these tables. But it is required for
objects referenced from multiple varying places, which is the case
for struct file.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Aug 31 2000 - 21:00:21 EST