[no subject]

From: Tejun Heo
Date: Wed Jun 11 2014 - 20:47:02 EST


percpu areas are zeroed on allocation and, by its nature, accessed
from multiple cpus. Consider the following scenario.

p = NULL;

CPU-1 CPU-2
p = alloc_percpu() if (p)
WARN_ON(this_cpu_read(*p));

As all percpu areas are zeroed on allocation, CPU-2 should never
trigger the WARN; however, there's no barrier between zeroing of the
percpu regions and the assignment of @p or between testing of @p and
dereferencing it and CPU-2 may see garbage value from before the
clearing and trigger the warning.

Note that this may happen even on archs which don't require data
dependency barriers. While CPU-2 woudln't reorder percpu area access
above the testing of @p, nothing prevents CPU-1 from scheduling
zeroing after the assignment of @p.

This patch fixes the issue by adding a smp_wmb() before a newly
allocated percpu pointer is returned and a smp_read_barrier_depends()
in __verify_pcpu_ptr() which is guaranteed to be invoked at least once
before every percpu access. It currently may be invoked multiple
times per operation which isn't optimal. Future patches will update
the definitions so that the macro is invoked only once per access.

It can be argued that smp_read_barrier_depends() is the responsibility
of the caller; however, discerning local and remote accesses gets very
confusing with percpu areas (initialized remotely but local to this
cpu and vice-versa) and data dependency barrier is free on all archs
except for alpha, so I think it's better to include it as part of
percpu accessors and operations.

I wonder whether we also need a smp_wmb() for other __GFP_ZERO
allocations. There isn't one and requiring the users to perform
smp_wmb() to ensure the propagation of zeroing seems too subtle.

Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
Cc: Christoph Lameter <cl@xxxxxxxxxxxxxxxxxxxx>
Cc: David Howells <dhowells@xxxxxxxxxx>
Cc: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
---
include/linux/percpu-defs.h | 15 ++++++++++++---
mm/percpu.c | 9 +++++++++
2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/include/linux/percpu-defs.h b/include/linux/percpu-defs.h
index a5fc7d0..b91bb37 100644
--- a/include/linux/percpu-defs.h
+++ b/include/linux/percpu-defs.h
@@ -1,6 +1,8 @@
#ifndef _LINUX_PERCPU_DEFS_H
#define _LINUX_PERCPU_DEFS_H

+#include <asm/barrier.h>
+
/*
* Base implementations of per-CPU variable declarations and definitions, where
* the section in which the variable is to be placed is provided by the
@@ -19,9 +21,15 @@
__attribute__((section(".discard"), unused))

/*
- * Macro which verifies @ptr is a percpu pointer without evaluating
- * @ptr. This is to be used in percpu accessors to verify that the
- * input parameter is a percpu pointer.
+ * This macro serves two purposes. It verifies @ptr is a percpu pointer
+ * without evaluating @ptr and provides the data dependency barrier paired
+ * with smp_wmb() at the end of the allocation path so that the memory
+ * clearing in the allocation path is visible to all percpu accsses.
+ *
+ * The existence of the data dependency barrier is guaranteed and percpu
+ * users can take advantage of it - e.g. percpu area updates followed by
+ * smp_wmb() and then a percpu pointer assignment are guaranteed to be
+ * visible to accessors which access through the assigned percpu pointer.
*
* + 0 is required in order to convert the pointer type from a
* potential array type to a pointer to a single item of the array.
@@ -29,6 +37,7 @@
#define __verify_pcpu_ptr(ptr) do { \
const void __percpu *__vpp_verify = (typeof((ptr) + 0))NULL; \
(void)__vpp_verify; \
+ smp_read_barrier_depends(); \
} while (0)

/*
diff --git a/mm/percpu.c b/mm/percpu.c
index 2ddf9a9..bd3cab8 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -816,6 +816,15 @@ area_found:
/* return address relative to base address */
ptr = __addr_to_pcpu_ptr(chunk->base_addr + off);
kmemleak_alloc_percpu(ptr, size);
+
+ /*
+ * The following wmb is paired with the data dependency barrier in
+ * the percpu accessors and guarantees that the zeroing of the
+ * percpu areas in pcpu_populate_chunk() is visible to all percpu
+ * accesses through the returned percpu pointer. The accessors get
+ * their data dependency barrier from __verify_pcpu_ptr().
+ */
+ smp_wmb();
return ptr;

fail_unlock:
--
1.9.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/