Re: [GIT PULL] kdbus for 4.1-rc1

From: Harald Hoyer
Date: Wed Apr 29 2015 - 08:48:49 EST


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 29.04.2015 01:12, John Stoffel wrote:
> LDAP is pretty damn generic, in that you can put pretty large objects into
> it, and pretty large OUs, etc. So why would it be a candidate for going
> into the kernel? And why is kdbus so important in the kernel as well?
> People have talked about it needing to be there for bootup, but isn't that
> why we ripped out RAID detection and such from the kernel and built
> initramfs, so that there's LESS in the kernel, and more in an early
> userspace? Same idea with dbus in my opinion.

Let me elaborate on the initramfs/shutdown situation a little bit more,
because I have to deal with that every day.

Because of the "let's move everything to userspace" sentiment we nowadays
have the situation, that we need a lot of tools to setup the root device.

Be it LVM on IMSM or iSCSI multipath, the initramfs has to setup the network
(with bridging, bonding, etc.), the iSCSI connection, assemble the raid, the
LVM, open crypto devices, etc...
And if something goes wrong, you want to have a shell, see all the logs and
debug things.

Now over the time we moved away from simple shell scripts (without any
logging) and static compiled special versions for the initramfs to a mini
distribution in the initramfs, which simplifies maintenance and improves
reliability.

Basically you want to use the same tools in the initramfs (and shutdown)
which you already have and use in your real root, with the same configuration
files and the same interfaces and the same code paths.

Therefore systemd is started in dracut created initramfs, which starts
journald for logging. The same basic systemd targets exist in the initramfs
as on the real root, so normally you don't have to cope with specialized
versions for the initramfs.

The target here is to have the same IPC mechanism from the very beginning to
the very end. No crappy fallback mechanisms in case a daemon is not running
or has crashed, no creepy transition from initramfs root to real root to
shutdown root.

We already have such transitions like: systemd, journald, mdmon [1], etc.
systemd has to serialize itself, journald's file descriptors are transitioned
over, mdmon jumps through hoops. Remember you want to get rid of open files
and executables and have to reexec everything, if you transition from the
initramfs root to the real root, and also from the real root to the shutdown
root.

We really don't want the IPC mechanism to be in a flux state. All tools have
to fallback to a non-standard mechanism in that case.

If I have to pull in a dbus daemon in the initramfs, we still have the
chicken and egg problem for PID 1 talking to the logging daemon and starting
dbus.
systemd cannot talk to journald via dbus unless dbus-daemon is started, dbus
cannot log anything on startup, if journald is not running, etc...

dbus-daemon would have to transition to the real root, and from the real root
to the shutdown root, without losing state.

Of course this can all be done, but it would involve fallback mechanisms,
which we want to get rid off. Hopefully, you don't suggest to merge dbus with
PID 1. Also with a daemon, you will lose the points mentioned in the cover mail
:

* Security: The peers which communicate do not have to trust each
other, as the only trustworthy component in the game is the kernel
which adds metadata and ensures that all data passed as payload is
either copied or sealed, so that the receiver can parse the data
without having to protect against changing memory while parsing
buffers. Also, all the data transfer is controlled by the kernel,
so that LSMs can track and control what is going on, without
involving userspace. Because of the LSM issue, security people are
much happier with this model than the current scheme of having to
hook into dbus to mediate things.

* Being in the kernel closes a lot of races which can't be fixed with
the current userspace solutions. For example, with kdbus, there is a
way a client can disconnect from a bus, but do so only if no further
messages present in its queue, which is crucial for implementing
race-free "exit-on-idle" services

* Eavesdropping on the kernel level, so privileged users can hook into
the message stream without hacking support for that into their
userspace processes

* A number of smaller benefits: for example kdbus learned a way to peek
full messages without dequeing them, which is really useful for
logging metadata when handling bus-activation requests.

I don't care, if the kdbus speedup is only marginal.

In my ideal world, there is a standard IPC mechanism from the beginning to
the end, which does not rely on any process running (except the kernel) and
which is used by _all_ tools, be it a system daemon providing information and
interfaces about device assembly or network setup tools or end user desktop
processes.

dbus _is_ such an easy, flexible standard IPC mechanism. Of course, you can
invent the wheel again (NIH, "we know better") and wait and see, if that
works out. Until then the whole common IPC problem is unresolved and Linux
distributions are just a collection of random software with no common
interoperability and home grown interfaces.

[1] transitioning mdmon is one of the critical parts for an IMSM raid array.
Also running an executable from a disk, which the executable is monitoring,
and which stops functioning, if the executable is not responding is insane.

Thanks for reading...


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJVQNL1AAoJEANOs3ABTfJw0BUQAJgj2RNKR8L7xVPwH2GovmST
nioOl6sg9u2m8NgYM6TJUJI3yHbOOiRVCTXHb9fmkTk/hBDxsT+X4lFevh0mDLJu
Y5bk1RwGn8Ail3GLR6il9RhlNMKEqN2Ik4Ey26IdxQkOhIIAy9IfrNBdsdoNpJ7I
P7qhP8J1DKfmIlgryrXy/mTZ1Nl1m6UlpMZHDSqlnPWuT/iJn0wORbs319fgAQx/
kkPvgSqTGkDetHGNzYmghgRzimNBR5ZftH0HS3Chq6rXPiSbdct/dE8VkQRiEWYo
k6tE83qJr9KbSdBFqnbznVaOpTCQatdanVPBzzz4DTkuSKBlAxIbdXRaFsJCSnKp
7r+h8q+AgdALJXEyx5AyBeh8/dK1a/PsMzOtYZg6FXAz211geTxHeY8bTdOrzys9
kJGwUbbq4rIyvseEl53+Ugh2qZQptDKCj6F46H3iuhsOyUbPXzg1E7K8w2gApwSY
L/eLEcQw+TApULyEhDrQqXlFBPz4vFP38mHNQ6T1Yt3sJuVoU12dOQNN6836htpe
h4ijpaTbUkFV8b/7xgGqOlSBio4iSppybXfiBtHT7NBa4da1L+WG0xT+nR8RSMxd
Gblt9ECZmbay6SIMYQBhntZD5Hs76iSJl0j2i9zg8E1pBw8O5w0jvlA02fOz2pkp
wQsPrxNdlkBxFHVtf/3V
=Dc23
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/