Re: [PATCH] bcache: Remove use of down/up_read_non_owner()

From: Steven Rostedt
Date: Wed Aug 21 2013 - 04:38:38 EST

Next message: Luis Henriques: "[3.5.y.z extended stable] Linux 3.5.7.20"
Previous message: Peter Zijlstra: "Re: [PATCH 04/10] sched, fair: Shrink sg_lb_stats and play memsetgames"
In reply to: Kent Overstreet: "Re: [PATCH] bcache: Remove use of down/up_read_non_owner()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, 20 Aug 2013 20:36:58 -0700
Kent Overstreet <kmo@xxxxxxxxxxxxx> wrote:

> I think we're mostly quibbling over semantics here, and it's not that
> big a deal - that said, I don't really get your distinction between a
> semaphore and a mutex; I'd distinguish them by saying a semaphore can
> count an arbitrary number of resources (the number it's initialized to),
> and a mutex is always initialized to one - the fact that in the kernel
> mutexes must be released by the same process that took them (while
> important for PI and debugging) is not fundamental.

Understood. As, I usually am focused on PI and debugging, it may be
more fundamental to me than others.

>
> "rw semaphore" never made any sense to me; they're not counting
> anything, like regular semaphores. They're just sleepable rw locks.

Me either.

>
> So it _sounds_ like you're saying bcache is using it as a semaphore,
> beacuse it's using it as a lock we don't release in the original
> context? in analogy to linux mutexes and semaphores? Not quite sure what
> you mean.

Actually, I'm thinking you are using it more as a completion than a
semaphore.

> > I have a hack that actually implements a work around for non_owner, but
> > it adds overhead to all locks to do so.
>
> Ok - I'd just be curious why the PI bit can't be factored out into a
> wrapper like how the debug stuff is handled with the _non_owner bits,
> but I can certainly believe that.

The problem is that all sleeping locks go through rt_mutex. The locks
are replaced with locks that incorporate PI.

> To simplify the algorithm just a bit (only leaving out some
> optimizations), bcache is using it to protect a rb tree, which containts
> "things that are undergoing background writeback".

I think this is what makes bcache unique in the kernel, and I don't
think there's open coded locks elsewhere.

>
> For foreground writes, we've got to check if the write overlaps with
> anything in this rb tree. If so, we force _this_ write to writeback;
> background writeback will write stale data to the backing device, but
> that's fine because the data in the cache will still be marked as dirty.
>
> To add stuff to this rb tree - i.e. for background writeback to start
> flushing some dirty data - it's got to make sure it's not going to
> overwrite some in progress foreground writethrough write.
>
> Tracking every outstanding foreground write would be expensive, so
> foreground writes take a read lock on the rb tree (both to check it for
> anything they overlap with, and for their duration), and background
> writeback takes a write lock when it goes to scan for dirty data to add
> to the rb tree.
>
> Even if you complain it's not _just_ protecting the rb tree, we're still
> using it for mutual exclusion, plain and simple, and that's all a lock
> is.

Well, rwlocks and rwsems are not mutually exclusive ;-) The read side
has multiple accessors. A mutual exclusion also would be a single task
needing exclusive access to a resource. But as things are done in the
background, that is not the case. But we are arguing semantics here and
not helping with a solution.

>
>
> > > Also, nack this patch because increasing the number of atomic ops to
> > > shared cachelines in our fast path. If it does end up being open coded,
> > > I'll make a more efficient version.
> >
> > Perhaps I can whip up a "struct rwsem_non_owner" and have a very
> > limited API for that, and then you can use that.
>
> That would be perfect :)

OK, I'll see what I can do.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Luis Henriques: "[3.5.y.z extended stable] Linux 3.5.7.20"
Previous message: Peter Zijlstra: "Re: [PATCH 04/10] sched, fair: Shrink sg_lb_stats and play memsetgames"
In reply to: Kent Overstreet: "Re: [PATCH] bcache: Remove use of down/up_read_non_owner()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]