Re: [BUG]: when printk too more through serial, cpu up is failed.

From: Shuge
Date: Thu Mar 14 2013 - 21:34:19 EST


ä 2013å03æ14æ 22:05, Greg KH åé:
On Thu, Mar 14, 2013 at 09:51:34PM +0800, Shuge wrote:
Hi all,
When the kernel printk too many log, the cpu is failed to come online.
The problem is this:
For example, cpu0 bring up cpu1:

a. cpu0 call cpu_up:
cpu_up()
->_cpu_up()
->__cpu_notify(CPU_UP_PREPARE)
->__cpu_up()
->boot_secondary()
# ->wait_for_completion_timeout(&cpu_running, msecs_to_jiffires(1000))
-> if (!cpu_online(cpu)) {
pr_crit("CPU%u: failed to come online\n", cpu);
ret = -EIO;
}
->cpu_notify(CPU_ONLINE)

b. cpu1 enter kernel:
secondary_start_kernel()
@ ->printk("CPU%u: Booted secondary processor\n", cpu)
* ->calibrate_delay()
->set_cpu_online()
->complete(cpu_running)
->cpumask_set_cpu()

While cpu0 run to mark #, which wait that cpu1 complete
cpu_running, and set online.
Generally, cpu0 can get it. But if the __log_buf is too large or
other threads write
it unceasing, then cpu1 come to mark @ or * in this moment. Cpu1 is
busy outputing
buffer, which cost time more than 1s, and cpu1 have not join in
sched, so cpu0 wait it timeout.
By reading printk.c, I found that can_use_console() always return
true, which be called by
console_trylock_for_printk(). Because, have_callable_console()
return ture always, if the console
driver set CON_ANYTIME flag. I think that cpu should not output the
__log_buf in coming online,
even though have_callable_console() is true.

/*
* Can we actually use the console at this time on this cpu?
*
* Console drivers may assume that per-cpu resources have
* been allocated. So unless they're explicitly marked as
* being able to cope (CON_ANYTIME) don't call them until
* this CPU is officially up.
*/
static inline int can_use_console(unsigned int cpu)
{
return cpu_online(cpu) || have_callable_console();
}

In can_use_console, why not is &&, but ||?

Kernel Version: 3.3.0
Why such an old and obsolete kernel version? Please try this on 3.8,
lots of work have gone into the printk area that should have solved this
issue.

greg k-h

I saw the printk.c in version 3.9, it still check console_trylock_for_printk() to decide to call console_unlock. In vprintk_emit(), cpu1 also have the opportunity to execute console_unlock() at coming online time.
Once cpu which is coming online can output buffer, nothing can interrupt it until buffer is empty.But we can't ensure that none always write the __log_buf. It is danger! I think, the solution is that we should prevent to use console at coming online.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/