Re: [RFC PATCH] livepatch: Speed up transition retries

From: Petr Mladek
Date: Thu Jul 08 2021 - 06:35:27 EST


On Wed 2021-07-07 14:49:41, Vasily Gorbik wrote:
> That's just a racy hack for now for demonstration purposes.
>
> On a s390 system with large amount of cpus
> klp_try_complete_transition() often cannot be "complete" from the first
> attempt. klp_try_complete_transition() schedules itself as delayed work
> after a second delay. This accumulates to significant amount of time when
> there are large number of livepatching transitions.
>
> This patch tries to minimize this delay to counting processes which still
> need to be transitioned and then scheduling
> klp_try_complete_transition() right away.
>
> For s390 LPAR with 128 cpu this reduces livepatch kselftest run time
> from
> real 1m11.837s
> user 0m0.603s
> sys 0m10.940s
>
> to
> real 0m14.550s
> user 0m0.420s
> sys 0m5.779s
>
> And qa_test_klp run time from
> real 5m15.950s
> user 0m34.447s
> sys 15m11.345s
>
> to
> real 3m51.987s
> user 0m27.074s
> sys 9m37.301s
>
> Would smth like that be useful for production use cases?
> Any ideas how to approach that more gracefully?

Honestly, I do not see a real life use case for this, except maybe
speeding up a test suite.

The livepatch transition is more about reliability than about speed.
In the real life, a livepatch will be applied only once in a while.

We have spent weeks thinking about and discussing the consistency
model, code, and barriers to handle races correctly. Especially,
klp_update_patch_state() is a super-sensitive beast because it is
called without klp_lock. It might be pretty hard to synchronize
it with klp_reverse_transition() or klp_force_transition().

You would need to come up with a really convincing use case and
numbers to make it worth the effort.

Best Regards,
Petr