Re: [RFC] Unify KVM kernel-space and user-space code into a singleproject

From: Avi Kivity
Date: Mon Mar 22 2010 - 13:57:48 EST


On 03/22/2010 07:34 PM, Ingo Molnar wrote:

The 'something trustable and kernel-provided'. The kernel knows nothing
about guest names.
The kernel certainly knows about other resources such as task names or network
interface names or tracepoint names. This is kernel design 101.

But it doesn't know about guest names. You can't trust task names since any user can create a task with any name. Network interfaces are root only so you can trust their names.

There are dozens or even hundreds of object classes the kernel does not know about and cannot enumerate. User names, for instance. X sessions. Windows (the screen artifact, not the OS). CIFS shares exported by this machine. Currently running applications (not processes).

btw, network interfaces would have been much better of using /dev/netif/name rather than having their own namespace, IMO, like disks.


[...] I don't like using the term, because sometimes the layers are
incorrect and need to be violated. But it should be done explicitly, not
as a shortcut for a minor feature (and profiling is a minor feature, most
users will never use it, especially guest-from-host).

The fact is we have well defined layers today, kvm virtualizes the cpu
and memory, qemu emulates devices for a single guest, libvirt manages
guests. We break this sometimes but there has to be a good reason. So
perf needs to talk to libvirt if it wants names. Could be done via
linking, or can be done using a pluging libvirt drops into perf.
This is really just the much-discredited microkernel approach for keeping
global enumeration data that should be kept by the kernel ...

I disagree it should be kept in the kernel. Why introduce a new namespace, with APIs to query it, manage it, rules regarding conflicts, then virtualize it for containers.

Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by Anthony.
There's numerous ways that this can break:

I don't like it either. We have libvirt for enumerating guests.

- Those special files can get corrupted, mis-setup, get out of sync, or can
be hard to discover.

- The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious
design flaw: it is per user. When i'm root i'd like to query _all_ current
guest images, not just the ones started by root. A system might not even
have a notion of '${HOME}'.

- Apps might start KVM vcpu instances without adhering to the
${HOME}/.qemu/qmp/ access method.

- it doesn't work with nfs.

- There is no guarantee for the Qemu process to reply to a request - while
the kernel can always guarantee an enumeration result. I dont want 'perf
kvm' to hang or misbehave just because Qemu has hung.

If qemu doesn't reply, your guest is dead anyway.

Really, for such reasons user-space is pretty poor at doing system-wide
enumeration and resource management. Microkernels lost for a reason.

Take a look at your desktop, userspace is doing all of that everywhere, from enumerating users and groups, to deciding how your disks are named. The kernel only provides the bare facilities.

You are committing several grave design mistakes here.

I am committing on the shoulders of giants.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/