Re: [PATCH v2 00/15] tracing: 'hist' triggers

From: Alexei Starovoitov
Date: Tue Mar 03 2015 - 21:23:03 EST


On 3/3/15 7:47 AM, Tom Zanussi wrote:
On Mon, 2015-03-02 at 18:31 -0800, Alexei Starovoitov wrote:
On Mon, Mar 2, 2015 at 5:18 PM, Tom Zanussi <tom.zanussi@xxxxxxxxxxxxxxx> wrote:

I'm not proposing to use asm or C for this 'hist->bpf' tool.
Keep proposed 'hist:keys=...:vals=...' syntax and generate
bpf program on the fly based on this string.
So this user tool will take string, generate program, load
and attach it. Everything based on this single string input.
With the examples you mentioned in docs, it's not hard.
It will be more involved than patch 12, but way more generic
and easily extensible when 'hist:keys=...' string would need
to be extended.


Hmm, this seems silly to me - doing all that work just to get back to
where we already are.

your 'hist->bpf' tool could have been first to solve 'bpf hard to use'
problem and overtime it could have evolved into full dtrace alternative.
Whereas by adding 'hist:keys=..' parsing to kernel you'll be stuck
with it and somebody else's dtrace-like tool will supersede it.

I don't think you can start requiring all new tracing functionality to
embed a code generator/compiler, or as a result force any particular
interface on users.

force interface on users?!
I think I've explained enough that bpf would not be a user visible
interface. It's kernel to user land interface. In my proposal of
'hist->bpf' tool users won't even see that bpf is being generated
and run underneath. Same with dtrace-like scripting. Users will
see scripting language and won't care how it's compiled and
executed inside kernel. Multiple different languages and interfaces
are possible. Including hist-like text that's not a language.

> If it's possible to factor out a common
infrastructure that can accommodate different user interfaces and
preferences, don't you think it makes sense to do that?

In any case, outside of the boilerplate tracing infrastructure code, it
seems to me that a lot of the code in the main hist trigger patches is
code that you'd also need for a tracepoint/bpf interface. I think we've
already agreed that it would be good to work towards being able to share
that, so for the next version, I'll see what I can come up with.

yes. let's see what pieces can be shared. Though I don't want to
sacrifice long term goals for short term solution.
In case of bpf_maps, the objective is to share data between
kernel and user space by single abstraction with different
implementations underneath. In some use cases kernel programs only
do lookups into maps, while user space concurrently adding/deleting
entries (so no allocations from programs)
In your patch 4 you're adding kernel only access to maps but planning
to use hash type only and also not exposing these maps to user space
at all... that defeats bpf_map purpose and you only reusing ~200
lines of hashtab.c code, but also need to hack it for pre-allocate
and make config_bpf_syscall a strong dependency for 'hist'. why?
'hist' patch set would have been much shorter if you just used
hashtable.h or rhashtable.h or even rbtree.h (so you don't need
to do sorting in patch 13). The actual hash insert/walk code
is tiny and trivial.

I've looked through the patches again and they seem to have serious
bugs when it comes to object life time:
- patch 5 that adds clear_map is completely broken, calling
delete_all_elements() by wrapping it in rcu_read_lock() ?!
- patch 6 adds 'free_elem' callback that is called right from
htab_map_delete_elem() while we're still under rcu.. ?!
- patch 12 does
struct hist_trigger_entry *entry;
tracing_map_lookup_elem(hist_data->map, next_key, &entry);
hist_trigger_entry_print(m, hist_data, next_key, entry);
so when you're storing a pointer inside the map the rcu protection
of map_lookup protects only the value of 'entry' pointer.
The actual contents of *entry are not protected by anything,
so even if you fix 5 and 6, you'll still have severe memory corruptions,
since nothing guarantees that contents of *entry are valid when
hist_trigger_entry_print() is trying to access it.

I think you'd be better off using vanilla hashtable.h

Thanks

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/