Re: hostile takeover: Re: [PATCH printk v2 3/8] printk: nbcon: Add acquire/release logic

From: John Ogness
Date: Tue Aug 29 2023 - 06:23:04 EST


On 2023-08-09, Petr Mladek <pmladek@xxxxxxxx> wrote:
>> Add per console acquire/release functionality. The console 'locked'
>> state is a combination of multiple state fields:
>>
>> - Hostile takeover
>>
>> The new owner takes the console over without 'req_prio'
>> handshake.
>>
>> This is required when friendly handovers are not possible,
>> i.e. the higher priority context interrupted the owning
>> context on the same CPU or the owning context is not able
>> to make progress on a remote CPU.
>
> I always expected that there would be only one hostile takeover.
> It would allow to take the lock a harsh way when the friendly
> way fails.

You are referring to the hostile takeover when unsafe. A hostile
takeover when safe is still considered a hostile takeover. (At least,
until now that is how it was considered. More below...)

>> All other policy decisions have to be made at the call sites:
>>
>> - What is marked as an unsafe section.
>> - Whether to spinwait if there is already an owner.
>> - Whether to attempt a hostile takeover when safe.
>> - Whether to attempt a hostile takeover when unsafe.
>
> But there seems to be actually two variants. How they are
> supposed to be used, please?
>
> I would expect that a higher priority context would always
> be able to takeover the lock when it is in a safe context.
>
> By other words, "hostile takeover when safe" would be
> the standard behavior for context with a higher priority.

The difference is that with "hostile takeover when safe" there is a
foreign CPU that still thinks it has the lock, even though it does not.

> By other words, the difference between normal takeover and
> "hostile takeover when safe" is that the 1st one has to
> wait until the current owner calls nbcon_enter_unsafe().
> But the result is the same. The current owner might
> prematurely end after calling nbcon_enter_unsafe().
>
> Maybe, this another relic from the initial more generic approach?

I suppose so. But then why not try the "hostile takeover when safe"
first and only use the handover request as a fallback? Like this:

1. try direct
2. try hostile takeover when safe
3. try handover
4. try hostile takeover when unsafe

Then we should remove the "hostile" label for #2 and just call it:

1. try direct
2. try safe takeover
3. try handover
4. try hostile takeover

(I guess this is how you imagined things should be.)

>> +/**
>> + * struct nbcon_context - Context for console acquire/release
>> + * @console: The associated console
>> + * @spinwait_max_us: Limit for spinwait acquire
>> + * @prio: Priority of the context
>> + * @unsafe: This context is in an unsafe section
>
> This seems to be an input value for try_acquire(). It is
> controversial.
>
> I guess that it removes the need to call nbcon_enter_unsafe()
> right after try_acquire_console().

Yes. It removes the "safe window" between try_acquire_console() and
nbcon_enter_unsafe() by allowing to acquire and mark unsafe
atomically. For example, this would be useful for the port_lock()
wrapper we discussed. But it is not strictly necessary. I can remove it.

Below I answer your comments anyway.

> Hmm, this semantic is problematic:
>
> 1. The result would be non-paired enter_unsafe()/exit_unsafe()
> calls.

The paired calls are either:

try_acquire(unsafe=1) -> release()

or

try_acquire(unsafe=0) + enter_unsafe() -> exit_unsafe() + release()

> 2. I would personally expect that this is an output value
> set by try_acquire() so that the context might know
> how it got the lock.

The state structure is used to communicate this information.

> 3. Strictly speaking, as an input value, it would mean that
> try_acquire() is called when the console is in "unsafe" state.
> But the caller does not know in which state the console
> is until it acquires the lock.

This is not about the current state. The calling _context_ wants to set
a certain state. The caller wants to acquire the lock and atomically
mark this as the beginning of an unsafe section. But as I wrote, this is
not strictly necessary. Only an efficient convenience.

> For me it is not easy to remember which permutation is used where ;-)
> It would be easier if we remove the "hostile when safe" variant.
> Then 3 variables might be enough. I suggest something like:
>
> state.unsafe Console is busy and takeover is not safe.
>
> state.unsafe_takeover A hostile takeover in an unsafe state happened
> in the past. The console can't be safe until
> re-initialized.
>
> ctxt.allow_unsafe_takeover Allow hostile takeover even if unsafe.
> Can be used only with PANIC prio. Might cause
> a system freeze when the console is used later.

Since "safe takeovers" should be handled equivalent to "handovers" then
I agree with your suggestion.

John