Re: [RFC] get_write_access()/deny_write_access() withoutinode->i_lock

From: Al Viro
Date: Mon Jun 20 2011 - 12:42:25 EST


On Mon, Jun 20, 2011 at 09:22:32AM -0700, Linus Torvalds wrote:
> On Mon, Jun 20, 2011 at 9:13 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > Er... ?The current mainline does atomic_read() followed by atomic_inc(),
> > so we get the same thing (plus the spin_lock()/spin_unlock()), don't we?
>
> Yes. Unless the spinlock is in the same cacheline. No reason not to
> fix that, though.
>
> Of course, if the "ETXTBUSY" case is the common case (which I doubt),
> then not doing the write at all would be the optimal case. But I doubt
> that case is even worth really worrying about ;)

It isn't, unless your box is spinning in attempts to do something like
opening /bin/sh for write. In which case you've got worse problems ;-)
>
> > For get_write_access() it's probably the right assumption for everything but
> > /dev/tty*; for deny_write_access() it's not - a lot of binaries are run by
> > more than one process...
>
> Note the fact that EVEN IF WE GUESS INCORRECTLY, performance is likely
> better by guessing rather than reading, unless you know the thing is
> already in the local CPU cache.
>
> Doing the loop twice instead of once is still *much* faster than an
> extra cache transaction that goes to the bus (or L3 or whatever).
>
> > FWIW, I wonder what will the things look like on ll/sc architectures;
>
> There are no ll/sc architectures worth worrying about, so I don't
> think that's the primary concern. That said, I don't disagree with
> creating a "atomic_inc_unless_negative()" helper.

OK... Let me see if I got it right:
static inline int atomic_inc_unless_negative(atomic_t *p)
{
int v, v1;
for (v = 0; v >= 0; v = v1) {
v1 = atomic_cmpxchg(p, v, v + 1);
if (v == v1)
return 1;
}
return 0;
}
with get_write_access(inode) becoming
return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBUSY;
and similar for atomic_dec_unless_positive()/deny_write_access()?

BTW, atomic_add_unless/atomic_inc_not_zero is done as read/cmpxchg on
everything I've checked, including alpha and sparc. I suspect that
it's seriously suboptimal there...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/