Re: Question about pseudo filesystems

From: Daniel Phillips (phillips@arcor.de)
Date: Mon Sep 09 2002 - 20:40:31 EST


On Tuesday 10 September 2002 02:44, Jamie Lokier wrote:
> Daniel Phillips wrote:
> > It wasn't obvious to you, was it. So how can you call it simple.
>
> I have to agree.
>
> > [...] This doesn't cover the whole range of module applications.
> > There is a significant class of module types that must rely on
> > sheduling techniques to prove inactivity. My suggestion covers both,
> > Al has his blinders on.
>
> Unfortunately having cleanup_module() return a value don't necessarily
> make things simpler. Sure, it's a general solution, but it's not always
> easier to use.

It's always as easy to use, or easier. It's certainly more obvious.

> Typically, your module's resources are protected by a lock or so.
> cleanup_module() could take this lock, check any private reference
> counts, and (because it has the lock) decide whether to proceed with
> unregistering the module's resources. Once it begins unregistering
> resources, it's pretty committed to unregistering them all and saying it
> exited ok.

And failing silently if it can't?

> Unfortunately, once it has the lock, and the reference counts are all
> zero, it's still _not_ generally safe to cleanup up a module.
>
> This is because any other function, for example a release() op on a
> file, or a remove() op on a PCI device, can't take a module's private
> lock, decrement the private reference count, release the lock and
> return. There's a race between releasing the lock and returning, where
> it still isn't safe to remove the module's memory.
>
> Even waiting for a schedule to happen won't help if CONFIG_PREEMPT is
> enabled.

That's exactly the race that is removed by having the module subsystem call
__exit to remove the module. Since the module subsystem checks the __exit's
flag on return and releases the lock, so there is no window after releasing
the lock when the releasor is still executing in the module.

> In other words, the module's idea of whether it's own resources are no
> longer in use _must_ be released by code outside the module - or at
> very least, locks protecting that information must be released outside
> the module.

Yup, that's exactly what I've proposed, in the simplest way possible.

> > Designs are not always correct just because they work.
>
> Unfortunately it's not immediately clear to me that having
> cleanup_module() be able to abort an exit actually helps.

Silent failure is about the worst thing that we could possibly design into
the system, but that's not even the issue I'm going on about - because I
think it's so appallingly obvious it's wrong, I assume everybody can see that.

Rather, it's the simple fact that this is the obvious interface a naive
person would expect, and nobody has presented a rational argument for not
using it.

> Doing so with RCU-style "wait until none of my module functions could
> possible be in their race window" might work, though. If you could 100%
> trust GCC's sibling call optimisation, variations on
> `spin_unlock_and_return' and `up_and_return' could be written.
>
> But even if you can write code that's safe, is it likely to be
> understood by most module authors? If not, it's no better than Al
> Viro's filesystem method.

It solves one of the races in a tidy, obviously correct way. There are other
races that are much more difficult (which don't for the most part apply to
filesystem modules) where we have to do cute things to be sure that all
threads are out of the module, however, that is an othogonal issue, and when
last sighted, it had a workable solution on the table, which requires each
task to schedule once. Config_preempt is a trivial issue: just increment the
preempt counter and nobody will preempt on you while you run the magic
quiescence test. The module runs this test, by the way, so that only modules
that can't figure this out by some less intrusive means have to impose
themselves on the rest of the system this way.

Anyway, this does not replace Al's filesystem method, it *uses* it, but only
where appropriate.

Of course we can design a more complex method for accomplishing the same
thing, but why? Have you looked at the module.c by the way? If you have and
you like it, you are one sick puppy ;-)

-- 
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Sep 15 2002 - 22:00:20 EST