Revisit AF_BUS: is it a better way to implement KDBUS?

From: cee1
Date: Thu Jul 30 2015 - 09:09:52 EST


Hi all,

I'm interested in the idea of AF_BUS.

There have already been varies discussions about it:
* Missing the AF_BUS - https://lwn.net/Articles/504970/
* Kroah-Hartman: AF_BUS, D-Bus, and the Linux kernel -
http://lwn.net/Articles/537021/
* presentation-kdbus -
https://github.com/gregkh/presentation-kdbus/blob/master/kdbus.txt
* Re: [GIT PULL] kdbus for 4.1-rc1 - https://lwn.net/Articles/641278/
* The kdbuswreck - https://lwn.net/Articles/641275/

I'm wondering whether it is a better way, that is, a general mechanism
to implement varies __Bus__ orientated IPCs, such as Binder[1],
DBus[2], etc.

The original design of AF_BUS is at
https://github.com/Airtau/genivi/blob/master/af_bus-linux/0002-net-bus-Add-AF_BUS-documentation.patch.
And following is my version of AF_BUS.

Some characteristics of a Bus orientated IPC:
1. A process creates a Bus, the process is then called 'bus master'.
2. Connects to a Bus, be assigned Bus address(es).
3. Sending/Receiving multicast message, in additional to P2P communication.
4. The implementation may base on shared memory model to avoid unnecessary copy.

## How to map point 1: """A process creates a Bus, the process is then
called 'bus master'"""
The [bus master] acts:

struct sockaddr_bus {
sa_family_t sbus_family; /* AF_BUS */
unsigned short sbus_addr_ncomp; /* number of
components of sbus_addr */
char sbus_path[BUS_PATH_MAX]; /* pathname of
this bus */
uint64_t sbus_addr[BUS_ADDR_COMP_MAX]; /* address
within the bus */
};
#define BUS_ADDR_MAX (BUS_ADDR_COMP_MAX * sizeof(uint64_t))

char bus_path[] = "/tmp/test"; /* non-abstract path */
char bus_addr[] = "org.example.bus";
struct sockaddr_bus addr = { .sbus_family = AF_BUS };

strncpy(addr.sbus_path, bus_path, BUS_PATH_MAX - 2);
memcpy(addr.sbus_addr, bus_addr, MIN(sizeof(bus_addr), BUS_ADDR_MAX));
addr.sbus_addr_ncomp = MIN(ALIGN(sizeof(bus_addr), 8) / 8, BUS_ADDR_COMP_MAX);

bus_fd = socket(AF_BUS, SOCK_DGRAM, 0);
/* creates a Bus, becomes the master of the bus */
bind(bus_fd, &addr, sizeof(struct sockaddr_bus));


## How to map point 2: """Connects to a Bus, be assigned Bus address(es)"""
### The [bus endpoint] acts:
fd = socket(AF_BUS, SOCK_DGRAM, 0);

/* AUTH message setup */
struct msghdr msghdr = {
.msg_name = &addr, /* bus master's addr */
.msg_namelen = sizeof(struct sockaddr_bus),
.msg_iov = &auth_iovec,
.msg_iovlen = 1,
};

msghdr.msg_controllen = CMSG_SPACE(sizeof(struct ucred));
msghdr.msg_control = alloca(msghdr.msg_controllen);
cmsg = CMSG_FIRSTHDR(&msghdr);
cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SCM_CREDENTIALS;
cmsg->cmsg_len = CMSG_LEN(sizeof(struct ucred));
ucred = (struct ucred *) CMSG_DATA(cmsg);
ucred->pid = getpid();
ucred->uid = getuid();
ucred->gid = getgid();

sendmsg(fd, &msghdr, MSG_NOSIGNAL);

### The [bus master] acts:
int optval = 1;
setsockopt(bus_fd, SOL_SOCKET, SO_PASSCRED, &optval, sizeof(optval));
recvmsg(bus_fd, &msghdr, MSG_NOSIGNAL);

/* do AUTH ... */

msghdr.msg_iov = &reply_iovec;
msghdr.msg_iovlen = 1;
msghdr.msg_controllen = 0;
msghdr.msg_control = NULL;

if (auth_ok) {
/* bus master allocates a bus addr */
char bus_path[] = "/tmp/test";
char ret_bus_addr[] = "1.1";
struct sockaddr_bus ret_addr = { .sbus_family = AF_BUS };

strncpy(ret_addr.sbus_path, bus_path, BUS_PATH_MAX - 2);
memcpy(ret_addr.sbus_addr, ret_bus_addr,
MIN(sizeof(ret_bus_addr), BUS_ADDR_MAX));
ret_addr.sbus_addr_ncomp = MIN(ALIGN(sizeof(ret_bus_addr), 8)
/ 8, BUS_ADDR_COMP_MAX);

/*
* 1. bus master returns the bus addr
* 2. kernel will apply it against the bus endpoint
* 3. the bus endpoint is then able to talk with endpoints on the bus.
*/
msghdr.msg_controllen = CMSG_SPACE(sizeof(struct sockaddr_bus));
msghdr.msg_control = alloca(msghdr.msg_controllen);
cmsg = CMSG_FIRSTHDR(&msghdr);
cmsg->cmsg_level = BUS_SOCKET;
cmsg->cmsg_type = SCM_OWNED_ADDR;
cmsg->cmsg_len = CMSG_LEN(sizeof(struct sockaddr_bus));
memcpy(CMSG_DATA(cmsg), &ret_addr, sizeof(struct sockaddr_bus));
}
sendmsg(bus_fd, &msghdr, MSG_NOSIGNAL);


## How to map point 3: """Sending/Receiving multicast message, in
additional to P2P communication""".
### P2P communication
Sometimes, a bus endpoint maybe assigned to multi-addresses. It may
want to send message through a specific address.

struct msghdr msghdr = {
.msg_name = &dst_addr,
.msg_namelen = sizeof(struct sockaddr_bus),
.msg_iov = &msg_iovec,
.msg_iovlen = 1,
};

char bus_path[] = "/tmp/test";
char bus_addr[] = "com.example.service1";
struct sockaddr_bus src_addr = { .sbus_family = AF_BUS };

strncpy(src_addr.sbus_path, bus_path, BUS_PATH_MAX - 2);
memcpy(src_addr.sbus_addr, bus_addr, MIN(sizeof(bus_addr), BUS_ADDR_MAX));
src_addr.sbus_addr_ncomp = MIN(ALIGN(sizeof(bus_addr), 8) / 8,
BUS_ADDR_COMP_MAX),

msghdr.msg_controllen = CMSG_SPACE(sizeof(struct sockaddr_bus));
msghdr.msg_control = alloca(msghdr.msg_controllen);
cmsg = CMSG_FIRSTHDR(&msghdr);
cmsg->cmsg_level = BUS_SOCKET;
cmsg->cmsg_type = SCM_SRC_ADDR;
cmsg->cmsg_len = CMSG_LEN(sizeof(struct sockaddr_bus));
memcpy(CMSG_DATA(cmsg), &src_addr, sizeof(struct sockaddr_bus));

sendmsg(my_sock_fd, &msghdr, MSG_NOSIGNAL);

### Multicast
The multicast address may look like:
{
.sbus_family = AF_BUS,

/* In a multicast addr, its bus_path is '*'-terminated */
.sbus_path = "/tmp/test\0\0\0\0\0...*",

.sbus_addr_ncomp = 8;
.sbus_addr = /* 8 * 64bits bitarray for example */
}

The receiver will request [bus master] for permitting to receive
messages from a set of multicast addresses, and the bus master grants
it with replying a control message:
{
.cmsg_level = BUS_SOCKET,
.cmsg_type = SCM_MULTICAST_MATCH,
.cmsg_data = /* the requested struct sockaddr_bus */
}

How does matching happen?
Let's assume someone sends message to multicast address maddr1, and
the receiver granted a match of maddr2:

The [kernel]:
is_matched = maddr1 & maddr2 == maddr2.

In this way, usespace can deploy bloom filters, and then it may
further apply eBPF to filter out "false positive" case.

## How to avoid unnecessary copy?
A sockopt similar to PACKET_RX_RING[3] may be introduced, which brings
a mmap/shared memory style API.


## Other thoughts
1. The bus master may want to receive notifications from the kernel,
such as "a bus endpoint died". A special sockaddr_bus "{
.sbus_addr_ncomp = 0, .sbus_addr = NULL }" indicates a message from
kernel.
2. A bus endpoint may pass a memfd to another bus endpoint, and then
they communicates under mmap/shared memory model, if it needs ultimate
performance.



---
1. http://www.freedesktop.org/wiki/Software/dbus/
2. http://elinux.org/Android_Binder
3. http://man7.org/linux/man-pages/man7/packet.7.html



Regards,

- cee1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/