Re: [PATCH v2] iw_cxgb3: Support "iwarp-only" interfaces to avoid4-tuple conflicts.

From: Steve Wise
Date: Mon Sep 17 2007 - 11:25:24 EST




Evgeniy Polyakov wrote:
Hi Steve.

On Sat, Sep 15, 2007 at 10:56:46AM -0500, Steve Wise (swise@xxxxxxxxxxxxxxxxxxxxx) wrote:
The iWARP driver must translate all listens on address 0.0.0.0 to the
set of rdma-only ip addresses for the device in question. This prevents
incoming connect requests to the TCP ipaddresses from going up the
rdma stack.
If the only solutions to solve a problem with hardware are to steal
packets or became a real device, then real device is much more
appropriate. Is that correct?

This is a real device. I don't understand your question? Packets aren't being stolen.

I meant port from main network stack. Sorry for confusion.

+static void insert_ifa(struct iwch_dev *rnicp, struct in_ifaddr *ifa)
+{
+ struct iwch_addrlist *addr;
+
+ addr = kmalloc(sizeof *addr, GFP_KERNEL);
As a small nitpick: this wants to be sizeof(struct in_ifaddr)

No, insert_ifa() allocates a struct iwch_addrlist, which has 2 fields: a list_head for linking, and a struct in_ifaddr pointer.

sizeof(struct iwch_addrlist) of course, not (*addr).
To simplify grep.
+ if (!addr) {
+ printk(KERN_ERR MOD "%s - failed to alloc memory!\n",
+ __FUNCTION__);
+ return;
+ }
+ addr->ifa = ifa;
+ mutex_lock(&rnicp->mutex);
+ list_add_tail(&addr->entry, &rnicp->addrlist);
+ mutex_unlock(&rnicp->mutex);
+}
What about providing error back to caller and fail to register?

There are two causes where this is called: 1) during module init to populate the list of iwarp addresses. If we failed in that case then, I _could_ then not register. 2) we get called via the notifier mechanism when an address is added. If that fails, the caller doesn't care (since we're on the notifier callout thread). But the code could perhaps unregister the device. I prefer just logging an error in case 2. I'll look into not registering if we cannot get any address due to lack of memory. But there's another case: we load the module and the admin hasn't yet created the ethX:iw interface.

Perhaps I should change the code to only register as a working rdma device _when_ we get at least one ethX:iwY interface created? Whatchathink?

Does second case ends up with problem you described in the initial
e-mail not being fixed?

No, the 2nd case (a failure to get the list of iwarp-only ip addresses) will cause rdma apps to not receive any incoming connections.

Consider we have eth1 with 1.1.1.1/24. Next, the admin creates the iwarp interface thusly:

ifconfig eth1:iw 2.2.2.2 netmask 255.255.255.0 up

The iw_cxgb3 driver gets a netblock notifier event for the addition of the eth1:iw interface, but FAILS to alloc the memory to keep track that 2.2.2.2 is the iwarp-only address.

Next an rdma app binds to 0.0.0.0, port X. The iw_cxgb3 attempts to map 0.0.0.0 to the set of valid iwarp addresses, but there are none. iw_cxgb3 logs a warning saying no addresses are available. The application ends up not listening to any address.

So TCP apps aren't affected.

Also, if the admin notes the error log entry, and re-initializes the eth1:iw interface -and- this time the kmalloc() works, then the existing rdma app bound to 0.0.0.0 port X will then start receiving connect requests to 2.2.2.2 port X.

Hope this makes sense.

+static inline int is_iwarp_label(char *label)
+{
+ char *colon;
+
+ colon = strchr(label, ':');
+ if (colon && !strncmp(colon+1, "iw", 2))
+ return 1;
+ return 0;
+}
I.e. it is not allowed to create ':iw' alias for anyone else?
Well, looks crappy, but if it is the only solution...

It is kinda crappy. But I don't see a better solution. Any ideas?

Does creating the whole new netdevice is a too big overhead, or is it
considered bad idea?

I think its too big overhead, and pretty invasive on the low level cxgb3 driver. I think having a device in the 'ifconfig -a' after iw_cxgb3 is loaded and devices discovered would be a good thing for the admin. This is the angle Roland suggested. I'm just not sure how to implement it.

But if someone could explain how I might create this full netdevice as a pseudo device on top of the real one, maybe I could implement it.

Note that non TCP traffic still needs to utilize this interface for ND to work properly with the RDMA core.



+static struct iwch_listen_entry *alloc_listener(struct iwch_listen_ep *ep,
+ __be32 addr)
Do you know, that cxgb3 function names suck? :)
Especially get_skb().

+{
+ struct iwch_dev *h = to_iwch_dev(ep->com.cm_id->device);
+ struct iwch_listen_entry *le;
+
+ le = kmalloc(sizeof *le, GFP_KERNEL);
Wants to be sizeof(struct iwch_listen_entry) and in other places too.

Do you mean I shouldn't use sizeof *le, but rather sizeof(struct iwch_listen_entry)? Is that the preferred coding style?

Yes, exactly.


Ok, now I get it. :)


Steve.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/