Re: system gets stuck in a lock during boot

From: Justin P. Mattock
Date: Tue Oct 06 2009 - 18:31:44 EST


Jason Baron wrote:
On Mon, Oct 05, 2009 at 09:24:09PM -0400, Steven Rostedt wrote:
On Fri, 2009-10-02 at 17:12 -0400, Jason Baron wrote:

hi Justin,

I've been playing around with gcc '4.5' as well and hit a panic that
looks very similar to what you've seen with stock 2.6.31 - I haven't
seen it anywhere else. Anyways, it seems to be some sort of alignment
issue with the 'struct ftrace_event_call'. I'm not sure yet if this is a
compiler or kernel issue. But the following kernel patch fixes the issue
for me. It would be interesting to verify if the patch also resolves the
issue for you.

thanks,

-Jason


diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 6ad76bf..0029af4 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -164,6 +164,7 @@
LIKELY_PROFILE() \
BRANCH_PROFILE() \
TRACE_PRINTKS() \
+ . = ALIGN(32); \
FTRACE_EVENTS() \
TRACE_SYSCALLS()

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index a81170d..43f9f1e 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -124,7 +124,7 @@ struct ftrace_event_call {
atomic_t profile_count;
int (*profile_enable)(struct ftrace_event_call *);
void (*profile_disable)(struct ftrace_event_call *);
-};
+} __attribute__((aligned(32)));

#define MAX_FILTER_PRED 32
#define MAX_FILTER_STR_VAL 128
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index f64fbaa..4697fb6 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -600,7 +600,7 @@ static int ftrace_raw_init_event_##call(void) \
} \
\
static struct ftrace_event_call __used \
-__attribute__((__aligned__(4))) \
+__attribute__((__aligned__(32))) \
__attribute__((section("_ftrace_events"))) event_##call = { \
.name = #call, \
.system = __stringify(TRACE_SYSTEM), \
Are all alignments needed? Or just adding one might help. Or removing
the one directly above?

-- Steve


So the problem I'm seeing is an oops on boot caused by the call->system pointer
deference in event_create_dir(). The 'call' variable is of type 'struct
ftrace_event_call'.

What's going on is that the 'struct ftrace_event_call' is of size 168 bytes
(sizeof(struct ftrace_event_call)) = 168 = 0xA8. However, in memory the
structures are 16-byte aligned. Thus, the stride for walking through the
pointers needs to be 176 (0xB0), but instead its 168 causing the oops.

I've only seen this issue while using gcc (GCC) 4.5.0 20090916, on a
vanilla 2.6.31 kernel.

That said, I'm not sure the compiler is doing the wrong thing here. The
'struct ftrace_event_call' contains an embedded 'struct list_head' which
is 16 bytes. According to the gcc docs, the aligned attribute, 'specifies a
minimum alignment for the variable or structure field, measured in bytes'.
Thus, at least according to the docs, gcc can increase the alignment of the
'struct ftrace_event_call', from its original specification of 4, to 16. Even
in the case where we are working corectly the structures are 8-byte aligned.

Thus, I would reccommend the patch below as a preventive measure. Its
the minimal patch I've found to resolve this issue. In general, if we
are going to walk data structures embedded in a special elf section, I
think the general rules needs to be to set the alignment to the power of
two which is greater than or equal to the largest item in the structure.

thanks,

-Jason

Signed-off-by: Jason Baron<jbaron@xxxxxxxxxx>


diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index a81170d..7182f03 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -124,7 +124,10 @@ struct ftrace_event_call {
atomic_t profile_count;
int (*profile_enable)(struct ftrace_event_call *);
void (*profile_disable)(struct ftrace_event_call *);
-};
+} __attribute__((aligned(16)));
+
+/* Align to the largest field in the data structure:
+ * sizeof(struct list_head) = 16 */

#define MAX_FILTER_PRED 32
#define MAX_FILTER_STR_VAL 128
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index f64fbaa..e344e81 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -600,7 +600,6 @@ static int ftrace_raw_init_event_##call(void) \
} \
\
static struct ftrace_event_call __used \
-__attribute__((__aligned__(4))) \
__attribute__((section("_ftrace_events"))) event_##call = { \
.name = #call, \
.system = __stringify(TRACE_SYSTEM), \




shoot I don't know why this is still hitting.
tried both patches and still.
As of now the only thing I can think of besides looking
at kernel/compiler is the patch for sysvinit to load
the policy(maybe something in there is old/outdated).

(BTW: not sure if it means anything but this system is x86_64
built from the multilib clfs, but with no 32 bit libs, pretty much
how fedora11 has there system built)

Justin P. Mattock
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/