Re: [RFC PATCH v3 3/3] devguard: added device guard for mknod in non-initial userns

From: Christian Brauner
Date: Fri Dec 15 2023 - 09:15:55 EST


On Fri, Dec 15, 2023 at 02:26:53PM +0100, Michael Weiß wrote:
> On 15.12.23 13:31, Christian Brauner wrote:
> > On Wed, Dec 13, 2023 at 03:38:13PM +0100, Michael Weiß wrote:
> >> devguard is a simple LSM to allow CAP_MKNOD in non-initial user
> >> namespace in cooperation of an attached cgroup device program. We
> >> just need to implement the security_inode_mknod() hook for this.
> >> In the hook, we check if the current task is guarded by a device
> >> cgroup using the lately introduced cgroup_bpf_current_enabled()
> >> helper. If so, we strip out SB_I_NODEV from the super block.
> >>
> >> Access decisions to those device nodes are then guarded by existing
> >> device cgroups mechanism.
> >>
> >> Signed-off-by: Michael Weiß <michael.weiss@xxxxxxxxxxxxxxxxxxx>
> >> ---
> >
> > I think you misunderstood me... My point was that I believe you don't
> > need an additional LSM at all and no additional LSM hook. But I might be
> > wrong. Only a POC would show.
>
> Yeah sorry, I got your point now.

I think I might have had a misconception about how this works.
A bpf LSM program can't easily alter a kernel object such as struct
super_block I've been told.

>
> >
> > Just write a bpf lsm program that strips SB_I_NODEV in the existing
> > security_sb_set_mnt_opts() call which is guranteed to be called when a
> > new superblock is created.
>
> This does not work since SB_I_NODEV is a required_iflag in
> mount_too_revealing(). This I have already tested when writing the
> simple LSM here. So maybe we need to drop SB_I_NODEV from required_flags
> there, too. Would that be safe?

Right. I think we might be able to add a new SB_I_MANAGED_DEVICES flag.
__UNTESTED, UNCOMPILED_

diff --git a/fs/namespace.c b/fs/namespace.c
index fbf0e596fcd3..e87cc0320091 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4887,7 +4887,6 @@ static bool mnt_already_visible(struct mnt_namespace *ns,

static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags)
{
- const unsigned long required_iflags = SB_I_NOEXEC | SB_I_NODEV;
struct mnt_namespace *ns = current->nsproxy->mnt_ns;
unsigned long s_iflags;

@@ -4899,9 +4898,13 @@ static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags
if (!(s_iflags & SB_I_USERNS_VISIBLE))
return false;

- if ((s_iflags & required_iflags) != required_iflags) {
- WARN_ONCE(1, "Expected s_iflags to contain 0x%lx\n",
- required_iflags);
+ if (!(s_iflags & SB_I_NOEXEC)) {
+ WARN_ONCE(1, "Expected s_iflags to contain SB_I_NOEXEC\n");
+ return true;
+ }
+
+ if (!(s_iflags & (SB_I_NODEV | SB_I_MANAGED_DEVICES))) {
+ WARN_ONCE(1, "Expected s_iflags to contain device access mask\n");
return true;
}

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 98b7a7a8c42e..6ca0fe922478 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1164,6 +1164,7 @@ extern int send_sigurg(struct fown_struct *fown);
#define SB_I_USERNS_VISIBLE 0x00000010 /* fstype already mounted */
#define SB_I_IMA_UNVERIFIABLE_SIGNATURE 0x00000020
#define SB_I_UNTRUSTED_MOUNTER 0x00000040
+#define SB_I_MANAGED_DEVICES 0x00000080

#define SB_I_SKIP_SYNC 0x00000100 /* Skip superblock at global sync */
#define SB_I_PERSB_BDI 0x00000200 /* has a per-sb bdi */

>
> >
> > Store your device access rules in a bpf map or in the sb->s_security
> > blob (This is where I'm fuzzy and could use a bpf LSM expert's input.).
> >
> > Then make that bpf lsm program kick in everytime a
> > security_inode_mknod() and security_file_open() is called and do device
> > access management in there. Actually, you might need to add one hook
> > when the actual device that's about to be opened is know.
> > This should be where today the device access hooks are called.
> >
> > And then you should already be done with this. The only thing that you
> > need is the capable check patch.
> >
> > You don't need that cgroup_bpf_current_enabled() per se. Device
> > management could now be done per superblock, and not per task. IOW, you
> > allowlist a bunch of devices that can be created and opened. Any task
> > that passes basic permission checks and that passes the bpf lsm program
> > may create device nodes.
> >
> > That's a way more natural device management model than making this a per
> > cgroup thing. Though that could be implemented as well with this.
> >
> > I would try to write a bpf lsm program that does device access
> > management with your capable() sysctl patch applied and see how far I
> > get.
> >
> > I don't have the time otherwise I'd do it.
> I'll give it a try but no promises how fast this will go.

No worries. We're entering the holiday season anyway.