Re: [RFC 4/4] {RFC} kmod.c: Add new call_usermodehelper_timeout()API

From: Boaz Harrosh
Date: Thu Mar 22 2012 - 15:09:17 EST


On 03/22/2012 07:27 AM, Oleg Nesterov wrote:
> On 03/21, Boaz Harrosh wrote:
>>
>>> @@ -258,7 +262,8 @@ static void __call_usermodehelper(struct work_struct *work)
>>>
>>> switch (wait) {
>>> case UMH_NO_WAIT:
>>> - call_usermodehelper_freeinfo(sub_info);
>>> + kref_put(&sub_info->kref, call_usermodehelper_freeinfo);
>>> + kref_put(&sub_info->kref, call_usermodehelper_freeinfo);
>>> break;
>
> This doesn't look very nice. If you add the refcounting, it should be
> consistent. Imho it is better to change call_usermodehelper_exec() so
> that UMH_NO_WAIT does kref_put() too. Just s/goto unlock/goto out/ afaics.
>

Yes I've seen this. after I sent the patch. Hence the RFC tag

>>> @@ -452,22 +459,27 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info,
>>>
>>> sub_info->complete = &done;
>>> sub_info->wait = wait;
>>> + if (!sub_info->wait_timeout)
>>> + sub_info->wait_timeout = MAX_SCHEDULE_TIMEOUT;
>>>
>>> + /* Balanced in __call_usermodehelper or wait_for_helper */
>>> + kref_get(&sub_info->kref);
>>> queue_work(khelper_wq, &sub_info->work);
>>> if (wait == UMH_NO_WAIT) /* task has freed sub_info */
>>> goto unlock;
>>> - wait_for_completion(&done);
>>> - retval = sub_info->retval;
>>> -
>>> + if (likely(wait_for_completion_timeout(&done, sub_info->wait_timeout)))
>>> + retval = sub_info->retval;
>>> + else
>>> + retval = -ETIMEDOUT;
>>> out:
>>> - call_usermodehelper_freeinfo(sub_info);
>>> + kref_put(&sub_info->kref, call_usermodehelper_freeinfo);
>>> unlock:
>>> helper_unlock();
>>> return retval;
>>> }
>
> This looks obviously wrong. You also need to move *sub_info->complete
> into subprocess_info.
>

Yes I caught that with farther testing. A stupid mistake. Again RFC

>> Author: Oleg Nesterov <oleg@xxxxxxxxxx>
>> Date: Wed Mar 21 10:57:41 2012 +1100
>>
>> usermodehelper: implement UMH_KILLABLE
>>
>> Implement UMH_KILLABLE, should be used along with UMH_WAIT_EXEC/PROC. The
>> caller must ensure that subprocess_info->path/etc can not go away until
>> call_usermodehelper_freeinfo().
>> ...
>>
>> I think that my patch above does a much better/cleaner lifetime management of the
>> subprocess_info struct, with the use of a kref.
>
> This is subjective, you know ;) I specially tried to avoid the
> refcounting.
>

Why?

The all kref_ abstraction comes to a simple atomic_inc/dec.
Which is in theory a more lite wait operation then xchg, no memory
bus locking, and in practice is the same. (Except on massively
parallel machines which it is)

The last time I submitted a patch with xchg I got clobbered on the head
so strong that I ran away from it as-fast-as-I-could.

For objects life cycle the kref_get/put pattern is a much simpler
more common and understood style in the Kernel, if just for that sake.

I don't see why it needs to be "avoided".

> In any case. I do not know why do we need timeout, but this is
> orthogonal to KILLABLE. Please redo your patches on top of -mm
> tree? Please note that in this case the change becomes trivial.
>

Yes you are right.

> And please explain the use-case for the new API.
>

The reason I need a timeout, is because: Calling from Kernel to
user-mode gives me the creeps. I don't trust user-mode programs,
specially when in final Control by a Distribution. Bugs can happen
and deadlocks are a possibility. An operation that should take
1/2 second and could max to at most 1.5 seconds, I can say in
confidence that after 15 seconds, a dmesg and a clean error recovery
is better. I don't want any chance of D stating IO operations.
(My code is in the IO path, either fsync or write-back. There is not
always a killable target)

The code path I have is easily recoverable, and if not for the scary
message in dmesg the user will not notice.

So in short it is so I can sleep at night.

>> Anyway I thought that we are not
>> suppose to use xhcg() since it is not portable to all ARCHs. ;-)
>
> Hmm. For example, exit_mm() does xchg().
>

Again, Personally I like xchg, but not here, not for an object
life-time management. Two threads share a structure, that needs
to go when the last one ends. That's a kref_ abstraction. Kref,
inside, could be implemented with xchg(), But that's not for me to
decide, I should use good abstractions when they exist and do the
job (well). No?

> Oleg.
>

Thanks Oleg, yes I'll rebase, Is there an mm git tree? I could not
find it on git://git.kernel.org/pub/scm/ . mean while I'll use a
random linux-next/master point. Which should do the job.

Thanks
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/