Re: [2.6.16 PATCH] Filessytem Events Reporter V3

From: Evgeniy Polyakov
Date: Tue Apr 11 2006 - 05:21:56 EST

Next message: Yi Yang: "Re: [2.6.16 PATCH] Filessytem Events Reporter V3"
Previous message: Oleg Nesterov: "Re: [PATCH] fix de_thread() vs do_coredump() deadlock"
In reply to: Yi Yang: "Re: [2.6.16 PATCH] Filessytem Events Reporter V3"
Next in thread: Yi Yang: "Re: [2.6.16 PATCH] Filessytem Events Reporter V3"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Apr 11, 2006 at 04:59:18PM +0800, Yi Yang (yang.y.yi@xxxxxxxxx) wrote:

> >>+ if (skb->len >= FSEVENT_FILTER_MSGSIZE) {
> >>
> >
> >I'm not sure about your size checks.
> >I think it should be compared with nlhdr->nlmsg_len?
> >
> At this point, skb->len should be the same as nlhdr->nlmsg_len.

Hmm, skb->len includes size of netlink header, but nlhdr->nlmsg_len does
not.

> >>+#define DEFINE_FILTER_MATCH_FUNC(filtertype, key) \
> >>+ static int match_##filtertype(listener * p, \
> >>+ struct fsevent * event, \
> >>+ struct sk_buff * skb) \
> >>+ { \
> >>+ int ret = 0; \
> >>+ filtertype * xfilter = NULL; \
> >>+ struct sk_buff * skb2 = NULL; \
> >>+ struct list_head * head = &(p->key##_filter_list_head); \
> >>+ list_for_each_entry(xfilter, head, list) { \
> >>+ if (xfilter->key != event->key) \
> >>+ continue; \
> >>+ ret = filter_fsevent(xfilter->mask, event->type); \
> >>+ if ( ret != 0) \
> >>+ return -1; \
> >>+ skb2 = skb_clone(skb, GFP_KERNEL); \
> >>+ if (skb2 == NULL) \
> >>
> >
> >Coding style.
> >
> >
> >>+ return -1; \
> >>+ NETLINK_CB(skb2).dst_group = 0; \
> >>+ NETLINK_CB(skb2).dst_pid = p->pid; \
> >>+ NETLINK_CB(skb2).pid = 0; \
> >>+ return (netlink_unicast(fsevent_sock, skb2, \
> >>+ p->pid, MSG_DONTWAIT)); \
> >>+ } \
> >>+ return -1; \
> >>+ } \
> >>+
> >>+DEFINE_FILTER_MATCH_FUNC(pid_filter, pid)
> >>+
> >>+DEFINE_FILTER_MATCH_FUNC(uid_filter, uid)
> >>+
> >>+DEFINE_FILTER_MATCH_FUNC(gid_filter, gid)
> >>
> >
> >You send the same data for each type of filters, maybe it is design
> >approach, but why don't you want to send that data in one skb?
> >
> netlink control block is not the same, netlink_broadcast is a typical case.

Yes, I see, pid is changed.

> >>+#define MATCH_XID(key, listenerp, event, skb) \
> >>+ ret = match_##key##_filter(listenerp, event, skb); \
> >>+ if (ret == 0) { \
> >>+ kfree_skb(skb); \
> >>+ continue; \
> >>
> >
> >Your match funtions can not return 0.
> >
> It can, if sending is successfull, netlink_unicast will return 0.

No, it returns skb->len on success.
netlink_broadcast() returns 0 on success.

> >>+static void __exit fsevent_exit(void)
> >>+{
> >>+ listener * p = NULL, * q = NULL;
> >>+ int cpu;
> >>+ int wait_flag = 0;
> >>+ struct sk_buff_head * skb_head = NULL;
> >>+
> >>+ fsevents_mask = 0;
> >>+ _raise_fsevent = 0;
> >>+ exit_flag = 1;
> >>+
> >>+ for_each_cpu(cpu)
> >>+ schedule_work(&per_cpu(fsevent_work, cpu));
> >>+
> >>+ while (1) {
> >>+ wait_flag = 0;
> >>+ for_each_cpu(cpu) {
> >>+ skb_head = &per_cpu(fsevent_send_queue, cpu);
> >>+ if (skb_head->qlen != 0) {
> >>+ wait_flag = 1;
> >>+ break;
> >>+ }
> >>+ }
> >>+ if (wait_flag == 1) {
> >>+ set_current_state(TASK_INTERRUPTIBLE);
> >>+ schedule_timeout(HZ/10);
> >>+ } else
> >>+ break;
> >>+ }
> >>
> >
> >This is still broken.
> >You race with schedule_work() in this loop. It requires
> >flush_scheduled_work().
> >
> >And I still have soume doubts about __raise_fsevent().
> >What if you set fsevents_mask to zero after __raise_fsevent() is
> >started, but not yet queued an skb, and above loop and scheduled work
> >are completed?
> >
> I think it is OK, schedule_timeout will release cpu to work queues,
> work queues should have enough time
> to finish their works, I don't know what is your reason.

It is not guaranteed that scheduled work will be processed until
flush_scheduled_work() completion, no matter how many times processor
has idle cycles.

Second issue is that both above loop and work can be finished, but some
__raise_fsevent() will be still in progress.

> >You need some type of completion of the last worker...
> >
> >
> >>+ atomic_set(&fsevent_sock->sk_rmem_alloc, 0);
> >>+ atomic_set(&fsevent_sock->sk_wmem_alloc, 0);
> >>
> >
> >This is really wrong, since it hides skb processing errors like double
> >freeing or leaks.
> >
> If userspace application terminated exceptionally, there are some skbs
> not to be consumed on socket, so
> if you rmmod it, sock_release will report some failure information, the
> above two statements will remove this
> error.

All queues will be flushed, when socket is freed, and if sock_release() shows
that assertion is failed, this definitely means you broke socket accounting,
for example freed skb two times.

> >>+ sock_release(fsevent_sock->sk_socket);

...

> >Btw, it would be nice to have some kind of microbenchmark,
> >like http://permalink.gmane.org/gmane.linux.kernel/292755
> >just to see how things go...
> >
> I have a userspace application to test fsevent, I'll release it to
> community in order to find more issues on
> fsevent.

And please publish some numbers so people could make some prognosis of
system behaviour.

--
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Yi Yang: "Re: [2.6.16 PATCH] Filessytem Events Reporter V3"
Previous message: Oleg Nesterov: "Re: [PATCH] fix de_thread() vs do_coredump() deadlock"
In reply to: Yi Yang: "Re: [2.6.16 PATCH] Filessytem Events Reporter V3"
Next in thread: Yi Yang: "Re: [2.6.16 PATCH] Filessytem Events Reporter V3"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]