Re: [RFC][PATCH -next 1/2] printk: move can_use_console out of console_trylock_for_printk

From: Sergey Senozhatsky
Date: Tue Jan 19 2016 - 10:02:52 EST


Hello,

On (01/19/16 14:31), Petr Mladek wrote:
> > On (01/18/16 16:42), Petr Mladek wrote:
> > > On Thu 2016-01-14 13:57:22, Sergey Senozhatsky wrote:
> > > > vprintk_emit() disables preemption around console_trylock_for_printk()
> > > > and console_unlock() calls for a strong reason -- can_use_console()
> > > > check. The thing is that vprintl_emit() can be called on a CPU that
> > > > is not fully brought up yet (!cpu_online()), which potentially can
> > > > cause problems if console driver accesses per-cpu data. A console
> > > > driver can explicitly state that it's safe to call it from !online
> > > > cpu by setting CON_ANYTIME bit in console ->flags. That's why for
> > > > !cpu_online() can_use_console() iterates all the console to find out
> > > > if there is a CON_ANYTIME console, otherwise console_unlock() must be
> > > > avoided.
> > > >
> > > > call_console_drivers(), called from console_cont_flush() and
> > > > console_unlock(), does the same test during for_each_console() loop.
> > > > However, we can have the following corner case. Assume that we have 2
> > > > cpus -- CPU0 is online, CPU1 is !online; and no CON_ANYTIME consoles
> > > > available.
> > > >
> > > > CPU0 online CPU1 !online
> > > > console_trylock()
> > > > ...
> > > > console_unlock()
> > >
> > > Please, where this console_unlock() comes from?
> >
> > from UP* or DOWN* (_PREPARE f.e.) notifiers on this CPU, for example, we don't
> > know what's going on there. what prevents it from calling console_trylock(),
> > grabbing the console_sem and eventually doing console_unlock()? there is
> > a can_use_console() check, but it handles only one case -- printk().
>
> So, is it a theoretical problem or do you know about any
> particular path where this happens?

a theoretical one at this point.

> Well, it might make sense to get rid of console_trylock_for_printk()
> and do the can_use_console() check at the beginning of
> unlock_console(). I mean to do
>
> - if (console_trylock_for_printk())
> + if (console_trylock())
> unlock_console();

agree, I almost removed it, but decided to keep for a while,
just in case if we would want to add any additional code there.


> But do we really need to repeat the check in every cycle?

well, on every iteration in the best case we check cpu_online()
only. which is what we would have done anyway in vprintk_emit(),
so no additional checks added. at the same time call_console_drivers
does not do '!cpu_online && !CON_ANYTIME' for each console now, so
in some sense there are less checks now (this is far even from a
micro-optimization, just noted).

> can_use_console() checks available consoles and if the CPU
> is online. Consoles could not get added or removed when
> we own console_sem. It seems that CPUs get disabled
> in a process context. Therefore it seems that it might happen
> only when unlock_console() gets rescheduled. I guess that
> it could not get scheduled back on an offline CPU. So, it
> seems that it is enough to check can_use_console() only once at
> the beginning.
>
> Or did I miss anything?

It does sound interesting, thanks. I was trying to keep the existing
behaviour.

It almost works, there is one case.

console_unlock() /* w/o can_use_console() in logbuf_lock section */
....
again:
for (;;) {
raw_spin_lock_irqsave logbuf_lock
msg_print_text
raw_spin_unlock logbuf_lock
call_console_drivers
local_irq_restore
}

up_console_sem

raw_spin_lock logbuf_lock
retry = console_seq != log_next_seq
raw_spin_unlock_irqrestore logbuf_lock

if retry && console_trylock()
goto again;


when we up_console_sem(), consoles may appear and disappear, since we
don't keep the semaphore. Suppose that we have OFFLINE cpu and we had a
CON_ANYTIME console, but in between up_console_sem--console_trylock
that single CON_ANYTIME console was removed. So now we have !cpu_online
and !CON_ANYTIME.

So this is why I re-do the can_use_console() check. I can add a bool flag
and do the can_use_console() only once -- when the flag is false, set it
to true on successful can_use_console() and avoid can_use_console() during
this loop; and set the flag to false right after the 'again:' label, which
will imply that console semaphore has been re-acquired and we need to
can_use_console() again.

something like this, perhaps (to be folded.. or should it be a separate
commit -- optimization). will give a test tomorrow.

I reused `bool retry' flag (which is probably a bit hacky, it'll be
better to have a separate byte for this thing). And I moved
can_use_console() from console_cont_flush() to the top -- so if we
are executing the `for (;;)' loop, then can_use_console() was
successful in console_cont_flush() and we have to re-do
can_use_console() only if we released console_sem and jumped to
`again' label.

---
kernel/printk/printk.c | 29 ++++++++++++++++++++---------
1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 28b2dec..c7a5ce8 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2170,6 +2170,11 @@ static bool console_cont_flush(char *text, size_t size)

raw_spin_lock_irqsave(&logbuf_lock, flags);

+ if (!can_use_console()) {
+ ret = false;
+ goto out;
+ }
+
if (!cont.len)
goto out;

@@ -2181,11 +2186,6 @@ static bool console_cont_flush(char *text, size_t size)
if (console_seq < log_next_seq && !cont.cons)
goto out;

- if (!can_use_console()) {
- ret = false;
- goto out;
- }
-
len = cont_print_text(text, size);
raw_spin_unlock(&logbuf_lock);
stop_critical_timings();
@@ -2219,7 +2219,7 @@ void console_unlock(void)
static u64 seen_seq;
unsigned long flags;
bool wake_klogd = false;
- bool do_cond_resched, retry, aborted = false;
+ bool do_cond_resched, retry = false, aborted = false;

if (console_suspended) {
up_console_sem();
@@ -2255,9 +2255,20 @@ again:

raw_spin_lock_irqsave(&logbuf_lock, flags);

- if (!can_use_console()) {
- aborted = true;
- break;
+ /*
+ * @retry == true implies that we have released console_sem
+ * and, thus, someone could have added/removed console(-s).
+ * we need to can_use_console() again. @retry intially set
+ * to false, because console_cont_flush() does the
+ * can_use_console() check and we can't execute this loop
+ * in can_use_console() returned `false` there.
+ */
+ if (retry) {
+ retry = false;
+ if (!can_use_console()) {
+ aborted = true;
+ break;
+ }
}

if (seen_seq != log_next_seq) {
--
2.7.0