Re: [PATCH] ftrace: fix race between ftrace and insmod

From: Steven Rostedt
Date: Tue Dec 15 2015 - 12:37:32 EST


On Tue, 15 Dec 2015 11:26:41 +0800
"Zhang, Yanmin" <yanmin_zhang@xxxxxxxxxxxxxxx> wrote:

> > This seems very hackish, although I can't think of a better way at the
> > moment. But I would like not to add more code into module.c if
> > possible, and just use a notifier unless there's a real reason we can't
> > (as there was when we added the hook in the first place).
> > We would double-check it again. It's not a good idea to change common
> > module codes.
>
> I double-checked it. If using notifier, we have to add 2 new module notification
> events in enum module_state.
> For example, MODULE_STATE_INITING, MODULE_STATE_INITING_FAILED.
> At MODULE_STATE_INITING, call ftrace_module_init(mod);
> At MODULE_STATE_INITING_FAILED, call ftrace_release_mod (or similar function);
> At MODULE_STATE_COMING, call a new function which links the new start_pg
> to ftrace_pages.
> Such design can also fix another bug in current kernel that ftrace start_pg
> of the module is not cleaned up if complete_formation fails.
> However, we change module common codes. The new events only serve ftrace.
>
> If not holding ftrace_mutex across start_pg setup and complete_formation,
> some other variables are not protected well, such like ftrace_pages,
> ftrace_start_up, and so on.

OK, that was basically the reason the hook was added in the first place.

The reason the modules notifier doesn't work here is that there's too
many exits from the module code that may leave the mutex held.

I thought about this some more. So the issue is that we add the module
functions to the ftrace records. Then another task enables function
tracing before the module is fully set up. As ftrace is enabling
function tracing, the module sets its text to read-only, outside the
scope of ftrace setting all text to read-write. When ftrace gets to the
module code, it fails to do the modifications and you get the
ftrace_bug() splat.

Sounds right?

Now, I really dislike the taking of the ftrace mutex and returning back
to module handling. That is very subtle, fragile and prone to bugs. I
took a step back to find another solution and I think I found one.

Take the below patch and add to it (I'll keep this patch as mine, and
you can submit another patch to do the following). You don't need to
resend this patch, just send me a patch that does this:


Add the module notifier that calls ftrace_module_enable(mod), removing
it from ftrace_init_module(). Feel free to monkey with that function to
be able to be called directly if it can't already.

What my patch does is to create a new ftrace record flag DISABLED,
which is set to all functions in a module as it is added. Then later on
(in the module notifier that you will add), the flag is cleared. In
between, the module functions will be ignored.


If the module errors out and the notifier is not called, we don't care.
The records will simply be removed, and the flag will be meaningless.

I needed to move the code around to still let the module code get
enabled if tracing was already enabled. I even cleaned that code up as
I found it has gotten a bit messy.

-- Steve

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 4736a826baf5..660e7c698f3b 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -357,6 +357,7 @@ bool is_ftrace_trampoline(unsigned long addr);
* REGS - the record wants the function to save regs
* REGS_EN - the function is set up to save regs.
* IPMODIFY - the record allows for the IP address to be changed.
+ * DISABLED - the record is not ready to be touched yet
*
* When a new ftrace_ops is registered and wants a function to save
* pt_regs, the rec->flag REGS is set. When the function has been
@@ -371,10 +372,11 @@ enum {
FTRACE_FL_TRAMP = (1UL << 28),
FTRACE_FL_TRAMP_EN = (1UL << 27),
FTRACE_FL_IPMODIFY = (1UL << 26),
+ FTRACE_FL_DISABLED = (1UL << 25),
};

-#define FTRACE_REF_MAX_SHIFT 26
-#define FTRACE_FL_BITS 6
+#define FTRACE_REF_MAX_SHIFT 25
+#define FTRACE_FL_BITS 7
#define FTRACE_FL_MASKED_BITS ((1UL << FTRACE_FL_BITS) - 1)
#define FTRACE_FL_MASK (FTRACE_FL_MASKED_BITS << FTRACE_REF_MAX_SHIFT)
#define FTRACE_REF_MAX ((1UL << FTRACE_REF_MAX_SHIFT) - 1)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index e290a30f2d0b..e7e15874fb7c 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2023,6 +2023,9 @@ static int ftrace_check_record(struct dyn_ftrace *rec, int enable, int update)

ftrace_bug_type = FTRACE_BUG_UNKNOWN;

+ if (rec->flags & FTRACE_FL_DISABLED)
+ return FTRACE_UPDATE_IGNORE;
+
/*
* If we are updating calls:
*
@@ -2849,64 +2852,41 @@ ops_references_rec(struct ftrace_ops *ops, struct dyn_ftrace *rec)
return 1;
}

-static int referenced_filters(struct dyn_ftrace *rec)
-{
- struct ftrace_ops *ops;
- int cnt = 0;
-
- for (ops = ftrace_ops_list; ops != &ftrace_list_end; ops = ops->next) {
- if (ops_references_rec(ops, rec))
- cnt++;
- }
-
- return cnt;
-}
-
static int ftrace_update_code(struct module *mod, struct ftrace_page *new_pgs)
{
struct ftrace_page *pg;
struct dyn_ftrace *p;
cycle_t start, stop;
unsigned long update_cnt = 0;
- unsigned long ref = 0;
- bool test = false;
+ unsigned long rec_flags = 0;
int i;

+ start = ftrace_now(raw_smp_processor_id());
+
/*
- * When adding a module, we need to check if tracers are
- * currently enabled and if they are set to trace all functions.
- * If they are, we need to enable the module functions as well
- * as update the reference counts for those function records.
+ * When a module is loaded, this function is called to convert
+ * the calls to mcount in its text to nops, and also to create
+ * an entry in the ftrace data. Now, if ftrace is activated
+ * after this call, but before the module sets its text to
+ * read-only, the modification of enabling ftrace can fail if
+ * the read-only is done while ftrace is converting the calls.
+ * To prevent this, the module's records are set as disabled
+ * and will be enabled after the call to set the module's text
+ * to read-only.
*/
- if (mod) {
- struct ftrace_ops *ops;
-
- for (ops = ftrace_ops_list;
- ops != &ftrace_list_end; ops = ops->next) {
- if (ops->flags & FTRACE_OPS_FL_ENABLED) {
- if (ops_traces_mod(ops))
- ref++;
- else
- test = true;
- }
- }
- }
-
- start = ftrace_now(raw_smp_processor_id());
+ if (mod)
+ rec_flags |= FTRACE_FL_DISABLED;

for (pg = new_pgs; pg; pg = pg->next) {

for (i = 0; i < pg->index; i++) {
- int cnt = ref;

/* If something went wrong, bail without enabling anything */
if (unlikely(ftrace_disabled))
return -1;

p = &pg->records[i];
- if (test)
- cnt += referenced_filters(p);
- p->flags = cnt;
+ p->flags = rec_flags;

/*
* Do the initial record conversion from mcount jump
@@ -2916,21 +2896,6 @@ static int ftrace_update_code(struct module *mod, struct ftrace_page *new_pgs)
break;

update_cnt++;
-
- /*
- * If the tracing is enabled, go ahead and enable the record.
- *
- * The reason not to enable the record immediatelly is the
- * inherent check of ftrace_make_nop/ftrace_make_call for
- * correct previous instructions. Making first the NOP
- * conversion puts the module to the correct state, thus
- * passing the ftrace_make_call check.
- */
- if (ftrace_start_up && cnt) {
- int failed = __ftrace_replace_code(p, 1);
- if (failed)
- ftrace_bug(failed, p);
- }
}
}

@@ -4938,6 +4903,19 @@ static int ftrace_process_locs(struct module *mod,

#define next_to_ftrace_page(p) container_of(p, struct ftrace_page, next)

+static int referenced_filters(struct dyn_ftrace *rec)
+{
+ struct ftrace_ops *ops;
+ int cnt = 0;
+
+ for (ops = ftrace_ops_list; ops != &ftrace_list_end; ops = ops->next) {
+ if (ops_references_rec(ops, rec))
+ cnt++;
+ }
+
+ return cnt;
+}
+
void ftrace_release_mod(struct module *mod)
{
struct dyn_ftrace *rec;
@@ -4980,12 +4958,82 @@ void ftrace_release_mod(struct module *mod)
mutex_unlock(&ftrace_lock);
}

+static void ftrace_module_enable(struct module *mod)
+{
+ struct dyn_ftrace *rec;
+ struct ftrace_page *pg;
+
+ mutex_lock(&ftrace_lock);
+
+ if (ftrace_disabled)
+ goto out_unlock;
+
+ /*
+ * If the tracing is enabled, go ahead and enable the record.
+ *
+ * The reason not to enable the record immediatelly is the
+ * inherent check of ftrace_make_nop/ftrace_make_call for
+ * correct previous instructions. Making first the NOP
+ * conversion puts the module to the correct state, thus
+ * passing the ftrace_make_call check.
+ *
+ * We also delay this to after the module code already set the
+ * text to read-only, as we now need to set it back to read-write
+ * so that we can modify the text.
+ */
+ if (ftrace_start_up)
+ ftrace_arch_code_modify_prepare();
+
+ do_for_each_ftrace_rec(pg, rec) {
+ int cnt;
+ /*
+ * do_for_each_ftrace_rec() is a double loop.
+ * module text shares the pg. If a record is
+ * not part of this module, then skip this pg,
+ * which the "break" will do.
+ */
+ if (!within_module_core(rec->ip, mod))
+ break;
+
+ cnt = 0;
+
+ /*
+ * When adding a module, we need to check if tracers are
+ * currently enabled and if they are, and can trace this record,
+ * we need to enable the module functions as well as update the
+ * reference counts for those function records.
+ */
+ if (ftrace_start_up)
+ cnt += referenced_filters(rec);
+
+ /* This clears FTRACE_FL_DISABLED */
+ rec->flags = cnt;
+
+ if (ftrace_start_up && cnt) {
+ int failed = __ftrace_replace_code(rec, 1);
+ if (failed) {
+ ftrace_bug(failed, rec);
+ goto out_loop;
+ }
+ }
+
+ } while_for_each_ftrace_rec();
+
+ out_loop:
+ if (ftrace_start_up)
+ ftrace_arch_code_modify_post_process();
+
+ out_unlock:
+ mutex_unlock(&ftrace_lock);
+}
+
static void ftrace_init_module(struct module *mod,
unsigned long *start, unsigned long *end)
{
if (ftrace_disabled || start == end)
return;
ftrace_process_locs(mod, start, end);
+ ftrace_module_enable(mod);
}

void ftrace_module_init(struct module *mod)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/