[PATCH] implement in-kernel keys & keyring management [try #2]

From: David Howells
Date: Sat Aug 07 2004 - 11:35:33 EST

Next message: Nicholas Miell: "Re: PATCH: cdrecord: avoiding scsi device numbering for ide devices"
Previous message: Jean-Luc Cooke: "Re: [PATCH] Delete cryptoloop"
In reply to: Trond Myklebust: "Re: [PATCH] implement in-kernel keys & keyring management"
Next in thread: James Morris: "Re: [PATCH] implement in-kernel keys & keyring management [try #2]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I've put a revised patch at:

http://people.redhat.com/~dhowells/keys/keys-268rc2-2.diff.bz2
Signed-Off-By: David Howells <dhowells@xxxxxxxxxx>

Andrew Morton <akpm@xxxxxxxx> wrote:
> - Why have a separate key_user structure, rather than adding stuff to
> struct user_struct? This appears to be feasible.

It is not. user_struct pins two keyrings; these two keyrings pin the tracking
structure, which if it was merged into user_struct would cause the keyrings to
pin it. Even if those two keyrings themselves are marked such they don't pin
the user_struct, they may contain a key somewhere down the tree that does.

I thought long and hard about that, and unfortunately, I think it's necessary
- unless we delete every key or keyring a user has caused to exist at the time
the user_struct goes away. Alternatively, we could just discard the quota
tracking, but this would mean a user could potentially gain more quota by
logging out.

> - I had a hard time working out the locking which protects key->flags.
> It looks racy.

Hmmm... Thinking about it, you're probably right. I've made it use a write
lock on the key's lock anytime the flags are changed if the key can be
accessed (eg: key_alloc and key_cleanup don't need locks).

> - Are the various system-wide locks going to end up hurting on big SMP?

They shouldn't. They aren't used exclusively very much.

(*) key_serial_lock is only used for key allocation, destruction and
procfs. Key creation shouldn't be a particularly common event.

(*) key_user_lock is used more or less the same, except it's used during call
back to userspace too.

(*) The key types sem is read-locked during key construction, but only locked
exclusively when a type is being added or removed (which should be a very
rare event).

(*) The key session sem should only be locked when a process joins a new
session. This should be very rare too.

(*) The keyring name lock is only called during keyring creation, destruction
and lookup. The first too shouldn't be particularly common, and the last
should only be done at userspace's behest (usually joining a named
session), and so should be very rare.

One problem though: currently each task has the pointer to the process
keyring in the task structure, which means that the process keyring must
exist before CLONE_THREAD happens so that it can be shared. Currently, I
create a new process keyring during fork if necessary and exec.

Maybe what I should do is move some of the thread-group specific stuff
out of the task_struct into a separate structure that's shared between
CLONE_THREAD tasks; this would permit me to only create the process
keyring when necessary.

Alternatively, I could just not create a process keyring until it's
required or CLONE_THREAD happens.

(*) The key construction lock. This is only locked as often as keys are
instantiated or requested from userspace, so it shouldn't be a common
event on a system.

But basically, in answer to you question, this shouldn't be a problem - almost
all the operations you're likely to do - searching for and using keys - are
done with per-key shared spinlocks.

The construction queue locks could be made per-user, I suppose.

> - user_describe_key() appears to leak a key ref if kmalloc() fails.
> user_describe_key() appears to leak a key ref if copy_to_user() faults.

Fixed.

> The one-return-statement-per-function rule rules.

Not necessarily. Do you ever investigate the asm produced by gcc?

For example, if you look as sys_keyctl() where each case statement is along
the lines of:

case KEYCTL_NEW:
return add_user_key((key_serial_t) arg2,
(const char __user *) arg3,
(const char __user *) arg4,
(const void __user *) arg5);

If you change these case statements to:

case KEYCTL_NEW:
ret = add_user_key((key_serial_t) arg2,
(const char __user *) arg3,
(const char __user *) arg4,
(const void __user *) arg5);
break;

And just return ret after the end of the switch statement, then the compiler
will behave differently.

In the first case, the call will be rendered as a JUMP, in the second you'll
get a CALL, a BRANCH and a RETURN. In that case
one-return-statement-per-function sucks.

Similarly with exec_keys() where I'd like to end on a tail call:

/* newly exec'd tasks get a fresh process keyring */
return install_process_keyring(tsk);

But can only do so if there's no error. This is now:

/* newly exec'd tasks get a fresh process keyring */
ret = install_process_keyring(tsk);
error:
return ret;

Meaning gcc won't generate a JUMP, but rather CALL and RETURN.

Besides that, insisting on one return statement per function is slightly
tricky to make efficient when returning a pointer that either points to a
structure with an elevated refcount or has an error cast into it; especially
so when locking is involved.

Also it requires extra variables and gotos and indentation in cases where
they're not really necessary. However, since you insisted, every function
barring sys_keyctl() should now have a single return statement.

> - Various people are likely to get upset about this:
> ...
> I guess the pure way to do it is to add 13 new syscalls....

I don't really want to add any syscalls, though I wouldn't be too upset to add
just one:-/

What're other people's thoughts on this?

> - I really hesitate to stick my nose into this perennial, but
> keyring_hash() is a bit lame. I once read that
>
> hash = hash * 33 + *p++;
>
> works as well as pretty much anything else.

It makes no difference if you then do "hash & 0x1f" to pick your bucket (it's
equivalent to "hash += (hash << 5) + *p++"). I suppose you'd have to fold the
hash value down somehow.

> - All that keyring tree management (keyring_detect_cycle) looks tricky.
> Do we really need such complexity?

Absolutely. A keyring search is a tree search (we supported keyring nesting),
so we have to lock each level as we down, and unlock as we come back up, lest
the tree change beneath us.

If we don't prevent cycles in the tree, then we can end up deadlocking
ourselves if someone is waiting to writelock something we've got locked.

Besides, since a keyring pins all the keys it points to, you can end up with a
set of resources mutually pinning one another in memory.

Think of hardlinking directories on a filesystem.

> - __key_link() does
> but elsewhere you freely do kfree(0)

Fixed. Isn't it more efficient, though, to check beforehand if we know the
value might well be 0?

> - Why does the proc code do spin_lock_irq(&key_user_lock)? Other
> process-context code just does spin_lock().

Fixed. key_put() can be called from interrupt context. It used to grab those
locks there, and so anything locking those locks had to disable irqs. I've
since changed things so that the actual key destruction is deferred to process
context, so I don't have to disable irqs when touching those locks.

> - In key_create_or_update():
> ...
> you've used the stat.h symbols elsewhere.

Fixed. I thought I'd caught them all.

> - How come install_process_keyring() and friends take task_lock() around
> the key_put()? It's not done elsewhere.

It's not the key_put() that's locked, it's the xchg():

task_lock(tsk);
key_put(xchg(&tsk->process_keyring, keyring));
task_unlock(tsk);

I've changed these to not use xchg() as there's no point, and to put the
key_put() calls outside of the critical section.

The reason task_lock() is must be held on change of keyring is that
/proc/pid/status displays the serial numbers of those keyrings, and that can
be used from another task.

Since a task can only change its own keyring subscriptions, it does not need a
lock meerly to access those keyrings. If it did, then accessing current->mm,
etc. would require a lock too.

> Please update the what-it-locks comment over task_lock()

Done.

> - join_session_keyring() leaks the memory at `name'. Trivial DoS hole.
> Please audit the whole patch.

Done. I still can't guarantee I've caught every last one, though.

> - request_key_construction() leaks `key' if __call_request_key() fails.

> Multiple return statements again...

Same answer again...

> - request_key() appears to leak a ref to the key if key_user_lookup()
> fails. Multiple return statements.

I don't see it. We only get a key if search_process_keyrings() succeeds, and
if it did we returned immediately. If we didn't return, "key" was holding an
error.

I don't see how not having multiple return statements will make that easier
for you to understand, since it's the condition on the if-statement you've
misread. Please explain.

> - In user_instantiate():
> the key_put() can be removed.

Done.

> It's a lot of code, isn't it?

warthog>wc security/keys/*.c | sort -n
189 586 4584 security/keys/user_defined.c
214 591 5077 security/keys/proc.c
282 890 7090 security/keys/request_key.c
642 1678 14571 security/keys/process_keys.c
835 2489 18457 security/keys/keyctl.c
847 2390 19148 security/keys/keyring.c
925 2557 21606 security/keys/key.c
3934 11181 90533 total

I suppose so.

----

Anyway, I've been through and audited it again, and I've put in your suggested
changes (there're a lot fewer return statements than there used to be, and a
lot more gotos:-)

Also, what are people's thoughts on adding more error codes?

ENOKEY - Couldn't find or obtain a key
EKEYREVOKED - Key was revoked
EKEYEXPIRED - Key expired

And, maybe:

ENOTKEYRING - Key is not a keyring

Otherwise you might find, for example, open() return ENOENT because it
couldn't get a key...

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Nicholas Miell: "Re: PATCH: cdrecord: avoiding scsi device numbering for ide devices"
Previous message: Jean-Luc Cooke: "Re: [PATCH] Delete cryptoloop"
In reply to: Trond Myklebust: "Re: [PATCH] implement in-kernel keys & keyring management"
Next in thread: James Morris: "Re: [PATCH] implement in-kernel keys & keyring management [try #2]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]