Re: [PATCH 2/2] net: Implement SO_PASSCGROUP to enable passing cgroup path

From: Andy Lutomirski
Date: Mon Apr 21 2014 - 11:48:19 EST


On Mon, Apr 21, 2014 at 8:03 AM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> So what happened to logger use case where logger accepts stream
> connections and logs the cgroup of client too.
>
> W.r.t systemd, looks like journald is accepting connections at
> /run/systemd/journal/stdout. (stdout_stream_new() and
> server_open_stdout_socket()).

See stdout_stream_line. As far as I can tell, journald already
implements this in mostly sensible manner, with no help from the
kernel required.

On my system, journalctl -f -o verbose says:

Mon 2014-04-21 08:34:52.732065 PDT
[s=4970edca25b4456d80b00e6e4cefd94b;i=2010;b=2d2454632c0f4f998a8d0158156ab743;m=66f5d274a;t=4f78f3d9a11a1;x=9902671f5a7e7bcc]
_UID=0
_BOOT_ID=2d2454632c0f4f998a8d0158156ab743
[...]
_GID=500
_AUDIT_SESSION=1
_AUDIT_LOGINUID=500
_SYSTEMD_CGROUP=/user.slice/user-500.slice/session-1.scope
_SYSTEMD_SESSION=1
_SYSTEMD_OWNER_UID=500
_SYSTEMD_UNIT=session-1.scope
_SYSTEMD_SLICE=user-500.slice
SYSLOG_IDENTIFIER=sudo
_COMM=sudo
_EXE=/usr/bin/sudo
_SELINUX_CONTEXT=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
MESSAGE=luto : TTY=pts/1 ; PWD=/home/luto/apps/systemd ; USER=root
; COMMAND=/usr/bin/journalctl -f -a
_PID=32393
_CMDLINE=sudo journalctl -f -a
_SOURCE_REALTIME_TIMESTAMP=1398094492732065

Unfortunately, the code in journald seems to be rather buggy and
prefers the unit that it derives from the (racy!) cg_path_get_unit
hack over the unit that is *already knows* (search the journald
sources for STDOUT_STREAM_UNIT_ID), but the right fix is the FIX THE
FSCKING JOURNALD BUG, not to change the kernel.

To summarize from my reading of how this crap words:

When a unit is created, systemd opens a stream socket pointing at
/run/systemd/journal/stdout. It tells journald the unit, along with
lots of other useful information. journald records this association
between the socket and the unit. Systemd could tell journald the
cgroup here, too, if it wanted it.

Systemd then starts the unit, passing it the socket as stdout, if
configured to do so.

That unit logs something. Journald then uses the crappy, racy ucred
mechanism to resolve the cgroup, login id, unit, etc.

Your proposals are to either (a) replace that with an almost-as-buggy
SO_PASSCGROUP option or to add SO_PEERCGROUP. The latter would allow
journald to figure out the cgroup that opened the socket. The problem
here is two-fold. One: systemd already knows the cgroup it intends to
use, and it can tell journald without kernel help. Two: Systemd seems
to open the stdout socket right before setting the cgroup, so the
kernel's idea of what cgroup opened the socket is crap.

The solution to all of this seems straightforward: fix journald to use
the information it already has, trusted, without races, from systemd.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/