Re: [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference()

From: Paul E. McKenney
Date: Mon Oct 26 2015 - 10:56:03 EST


On Mon, Oct 26, 2015 at 09:45:06AM +0100, Ingo Molnar wrote:
>
> * Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>
> > From: Patrick Marlier <patrick.marlier@xxxxxxxxx>
> >
> > The current list_entry_rcu() implementation copies the pointer to a stack
> > variable, then invokes rcu_dereference_raw() on it. This results in an
> > additional store-load pair. Now, most compilers will emit normal store
> > and load instructions, which might seem to be of negligible overhead,
> > but this results in a load-hit-store situation that can cause surprisingly
> > long pipeline stalls, even on modern microprocessors. The problem is
> > that it takes time for the store to get the store buffer updated, which
> > can delay the subsequent load, which immediately follows.
> >
> > This commit therefore switches to the lockless_dereference() primitive,
> > which does not expect the __rcu annotations (that are anyway not present
> > in the list_head structure) and which, like rcu_dereference_raw(),
> > does not check for an enclosing RCU read-side critical section.
> > Most importantly, it does not copy the pointer, thus avoiding the
> > load-hit-store overhead.
> >
> > Signed-off-by: Patrick Marlier <patrick.marlier@xxxxxxxxx>
> > [ paulmck: Switched to lockless_dereference() to suppress sparse warnings. ]
> > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> > ---
> > include/linux/rculist.h | 5 +----
> > 1 file changed, 1 insertion(+), 4 deletions(-)
> >
> > diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> > index 17c6b1f84a77..5ed540986019 100644
> > --- a/include/linux/rculist.h
> > +++ b/include/linux/rculist.h
> > @@ -247,10 +247,7 @@ static inline void list_splice_init_rcu(struct list_head *list,
> > * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock().
> > */
> > #define list_entry_rcu(ptr, type, member) \
> > -({ \
> > - typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \
> > - container_of((typeof(ptr))rcu_dereference_raw(__ptr), type, member); \
> > -})
> > + container_of(lockless_dereference(ptr), type, member)
>
> So this commit:
>
> 8db70b132dd5 ("rculist: Make list_entry_rcu() use lockless_dereference()")
>
> when merged with Linus's latest tree, triggers the following build failure on
> allyesconfig/allmodconfig x86:
>
> triton:~/tip> make fs/fs-writeback.o
> CHK include/config/kernel.release
> CHK include/generated/uapi/linux/version.h
> CHK include/generated/utsrelease.h
> CHK include/generated/bounds.h
> CHK include/generated/timeconst.h
> CHK include/generated/asm-offsets.h
> CALL scripts/checksyscalls.sh
> CC fs/fs-writeback.o
> In file included from fs/fs-writeback.c:16:0:
> fs/fs-writeback.c: In function âbdi_split_work_to_wbsâ:
> include/linux/compiler.h:281:20: error: lvalue required as unary â&â operand
> __read_once_size(&(x), __u.__c, sizeof(x)); \
> ^
> include/linux/kernel.h:811:49: note: in definition of macro âcontainer_ofâ
> const typeof( ((type *)0)->member ) *__mptr = (ptr); \
> ^
> include/linux/compiler.h:286:22: note: in expansion of macro â__READ_ONCEâ
> #define READ_ONCE(x) __READ_ONCE(x, 1)
> ^
> include/linux/compiler.h:533:26: note: in expansion of macro âREAD_ONCEâ
> typeof(p) _________p1 = READ_ONCE(p); \
> ^
> include/linux/rculist.h:250:15: note: in expansion of macro âlockless_dereferenceâ
> container_of(lockless_dereference(ptr), type, member)
> ^
> fs/fs-writeback.c:782:29: note: in expansion of macro âlist_entry_rcuâ
> struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list,
> ^
> include/linux/compiler.h:283:28: error: lvalue required as unary â&â operand
> __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
> ^
> include/linux/kernel.h:811:49: note: in definition of macro âcontainer_ofâ
> const typeof( ((type *)0)->member ) *__mptr = (ptr); \
> ^
> include/linux/compiler.h:286:22: note: in expansion of macro â__READ_ONCEâ
> #define READ_ONCE(x) __READ_ONCE(x, 1)
> ^
> include/linux/compiler.h:533:26: note: in expansion of macro âREAD_ONCEâ
> typeof(p) _________p1 = READ_ONCE(p); \
> ^
> include/linux/rculist.h:250:15: note: in expansion of macro âlockless_dereferenceâ
> container_of(lockless_dereference(ptr), type, member)
> ^
> fs/fs-writeback.c:782:29: note: in expansion of macro âlist_entry_rcuâ
> struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list,
> ^
> scripts/Makefile.build:258: recipe for target 'fs/fs-writeback.o' failed
> make[1]: *** [fs/fs-writeback.o] Error 1
> Makefile:1526: recipe for target 'fs/fs-writeback.o' failed
> make: *** [fs/fs-writeback.o] Error 2
>
> It's this new usage in fs/fs-writeback.c:
>
> static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
> struct wb_writeback_work *base_work,
> bool skip_if_busy)
> {
> struct bdi_writeback *last_wb = NULL;
> struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list,

I believe that the above should instead be:

struct bdi_writeback *wb = list_entry_rcu(bdi->wb_list.next,

After all, RCU read-side list primitives need to fetch pointers in order
to traverse those pointers in an RCU-safe manner. The patch below clears
this up for me, does it also work for you?

Thanx, Paul

------------------------------------------------------------------------

commit 9221c4e78cc3511cd15b1433617ae2548ad8f631
Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
Date: Mon Oct 26 07:47:20 2015 -0700

fs: Fix argument to list_entry_rcu()

The list_entry_rcu() macro requires that its first argument be the
pointer to the list_head contained in the structure whose pointer is to
be returned, but fetched in an RCU-safe manner. This commit therefore
fixes the use in bdi_split_work_to_wbs() to follow this convention.

Reported-by: Ingo Molnar <mingo@xxxxxxxxxx>
Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 29e4599f6fc1..126d4de8faad 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -779,7 +779,7 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
bool skip_if_busy)
{
struct bdi_writeback *last_wb = NULL;
- struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list,
+ struct bdi_writeback *wb = list_entry_rcu(bdi->wb_list.next,
struct bdi_writeback, bdi_node);

might_sleep();

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/