[PATCH 0/1] rcu_sync: Cleanup the CONFIG_PROVE_RCU checks

From: Oleg Nesterov
Date: Fri Sep 11 2015 - 12:01:54 EST


On 09/10, Paul E. McKenney wrote:
>
> On Thu, Sep 10, 2015 at 03:59:42PM +0200, Oleg Nesterov wrote:
> > On 09/09, Paul E. McKenney wrote:
> > >
> > > This is obsolete, but its replacement is the same patch.
> >
> > fbe3b97183f84155d81e506b1aa7d2ce986f7a36 in linux-rcu.git#experimental
> > I guess?
> >
> > > Oleg, Davidlohr, am I missing something on how percpu_rwsem or
> > > locktorture work?
> >
> > No, I think the patch is fine. Thanks for doing this! I was going to
> > send something like this change too. And in fact I am still thinking
> > about another test which plays with rcu_sync only, but probably we
> > need some cleanups first (and we need them anyway). I'll try to do
> > this a bit later.
>
> I would welcome an rcu_sync-specific torture patch!

I want it much more than you ;) I have already warned you, I'll send
more rcu_sync patches. The current code is actually a very early draft
which was written during the discussion with Peter a long ago. I sent
it unchanged because a) it was already reviewed and b) I tested it a
bit in the past.

We can greatly simplify this code and at the same time make it more
useful. Actually I already have the patches. The 1st one removes
rcu_sync->cb_state and gp_ops->sync(). This makes the state machine
almost self-obvious and allows other improvements. See the resulting
(pseudo) code at the end.

But again, I'll try very much to write the test before I send the patch.


Until then, let me send this trivial cleanup. The CONFIG_PROVE_RCU
code looks trivial but imo really annoying. And it is not complete,
so lets document this at least. Plus rcu_lockdep_assert() looks more
consistent.


> > > +void torture_percpu_rwsem_init(void)
> > > +{
> > > + BUG_ON(percpu_init_rwsem(&pcpu_rwsem));
> > > +}
> > > +
> >
> > Aha, we don't really need this... I mean we can use the static initialiser
> > which can also be used by uprobes and cgroups. I'll try to send the patch
> > tomorrow.
>
> Very good, please do!

Hmm. I am lier. I won't send this patch at least today.

The change I had in mind is very simple,

#define DECLARE_PERCPU_RWSEM(sem) \
static DEFINE_PER_CPU(unsigned int, sem##_counters); \
struct percpu_rw_semaphore sem = { \
.fast_read_ctr = &sem##_counters, \
... \
}

and yes, uprobes and cgroups can use it.

But somehow I missed that we can't use it to define a _static_ sem,

static DECLARE_PERCPU_RWSEM(sem);

obviously won't work. And damn, I am shy to admit that I spent several
hours trying to invent something but failed. Perhaps we can add 2 helpers,
DECLARE_PERCPU_RWSEM_GLOBAL() and DECLARE_PERCPU_RWSEM_STATIC().

Oleg.

-------------------------------------------------------------------------------
static const struct {
void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
void (*wait)(void); // TODO: remove this
#ifdef CONFIG_PROVE_RCU
int (*held)(void);
#endif
} gp_ops[] = {
...
};

// COMMENT to explain these states
enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT, GP_REPLAY };

#define rss_lock gp_wait.lock

// !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!1!!!!!!!!
// XXX code must be removed when we split rcu_sync_enter() into start + wait
// !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

static void rcu_sync_func(struct rcu_head *rcu)
{
struct rcu_sync *rsp = container_of(rcu, struct rcu_sync, cb_head);
unsigned long flags;

BUG_ON(rsp->gp_state == GP_IDLE);
BUG_ON(rsp->gp_state == GP_PASSED);

spin_lock_irqsave(&rsp->rss_lock, flags);
if (rsp->gp_count) {
/*
* COMMENT.
*/
rsp->gp_state = GP_PASSED;
wake_up_locked(&rsp->gp_wait);
} else if (rsp->gp_state == GP_REPLAY) {
/*
* A new rcu_sync_exit() has happened; requeue the callback
* to catch a later GP.
*/
rsp->gp_state = GP_EXIT;
gp_ops[rsp->gp_type].call(&rsp->cb_head, rcu_sync_func);
} else {
/*
* We're at least a GP after rcu_sync_exit(); eveybody will now
* have observed the write side critical section. Let 'em rip!.
*/
BUG_ON(rsp->gp_state == GP_ENTER); // XXX
rsp->gp_state = GP_IDLE;
}
spin_unlock_irqrestore(&rsp->rss_lock, flags);
}

static void rcu_sync_call(struct rcu_sync *rsp)
{
// TODO:
// This is called by might_sleep() code outside of ->rss_lock,
// we can avoid ->call() in some cases (say rcu_blocking_is_gp())
gp_ops[rsp->gp_type].call(&rsp->cb_head, rcu_sync_func);
}

void rcu_sync_enter(struct rcu_sync *rsp)
{
int gp_count, gp_state;

spin_lock_irq(&rsp->rss_lock);
gp_count = rsp->gp_count++;
gp_state = rsp->gp_state;
if (gp_state == GP_IDLE)
rsp->gp_state = GP_ENTER;
spin_unlock_irq(&rsp->rss_lock);

BUG_ON(gp_count != 0 && gp_state == GP_IDLE);
BUG_ON(gp_count == 0 && gp_state == GP_PASSED);
BUG_ON(gp_count == 0 && gp_state == GP_ENTER); // XXX

if (gp_state == GP_IDLE)
rcu_sync_call(rsp);

wait_event(rsp->gp_wait, rsp->gp_state != GP_ENTER);
BUG_ON(rsp->gp_state < GP_PASSED);
}

void rcu_sync_exit(struct rcu_sync *rsp)
{
bool need_call;

BUG_ON(rsp->gp_state == GP_IDLE);
BUG_ON(rsp->gp_state == GP_ENTER); // XXX

spin_lock_irq(&rsp->rss_lock);
if (!--rsp->gp_count) {
if (rsp->gp_state == GP_PASSED) {
need_call = true;
rsp->gp_state = GP_EXIT;
} else if (rsp->gp_state == GP_EXIT) {
rsp->gp_state = GP_REPLAY;
}
}
spin_unlock_irq(&rsp->rss_lock);

// Comment to explain why we do not care if another enter()
// and perhaps even exit() comes after spin_unlock().
if (need_call)
rcu_sync_call(rsp);
}

void rcu_sync_dtor(struct rcu_sync *rsp)
{
int gp_state;

BUG_ON(rsp->gp_count);
BUG_ON(rsp->gp_state == GP_ENTER); // XXX
BUG_ON(rsp->gp_state == GP_PASSED);

spin_lock_irq(&rsp->rss_lock);
if (rsp->gp_state == GP_REPLAY)
rsp->gp_state = GP_EXIT;
gp_state = rsp->gp_state;
spin_unlock_irq(&rsp->rss_lock);

// TODO: add another wake_up_locked() into rcu_sync_func(),
// use wait_event + spin_lock_wait, remove gp_ops->wait().

if (gp_state != GP_IDLE) {
gp_ops[rsp->gp_type].wait();
BUG_ON(rsp->gp_state != GP_IDLE);
}
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/