Re: 2.6.{26.2,27-rc} oops on virtualbox

From: Luiz Fernando N. Capitulino
Date: Thu Aug 28 2008 - 09:30:42 EST


Em Wed, 27 Aug 2008 19:33:28 -0400
Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxx> escreveu:

| * Luiz Fernando N. Capitulino (lcapitulino@xxxxxxxxxxxxxxx) wrote:
| > Em Tue, 26 Aug 2008 22:34:49 +0200
| > Gerhard Brauer <gerhard.brauer@xxxxxx> escreveu:
| >
| > | On Tue, Aug 26, 2008 at 02:15:58PM -0400, Mathieu Desnoyers wrote:
| > | >
| > | > Ok, it might still be caused by paravirt and alternatives instruction
| > | > patching. What if you also do :
| > | >
| > | > alternative_instructions()
| > | >
| > | > + unsigned long flags;
| > | > /* The patching is not fully atomic, so try to avoid local interruptions
| > | > that might execute the to be patched code.
| > | > Other CPUs are not running. */
| > | > stop_nmi();
| > | > #ifdef CONFIG_X86_MCE
| > | > stop_mce();
| > | > #endif
| > | > + local_irq_save(flags);
| > | >
| > | >
| > | > ...
| > | > + local_irq_restore(flags);
| > | > restart_nmi();
| > | > #ifdef CONFIG_X86_MCE
| > | > restart_mce();
| > | > #endif
| > | >
| > | > ?
| > |
| > | Hej! This last changes (in addition to the others you mentioned) seems
| > | to be a good shot. I could reboot 8 times the guest, compile several
| > | packages (something which always leeds to the oops) and currently i
| > | build two big packages simultan. So this is heavy IO.
| >
| > Yeah, it works for me too and it's good to know that you are doing
| > additional tests. I'm doing only boot tests... I was testing lots of
| > kernels and doing additional tests would take a lot of time.
| >
| > Now, what does this mean? Is VirtualBox issuing interrupts when it
| > shouldn't or should this section of the code be better protected?
| >
|
| Since this problem appears while we are using a simple memcpy (the
| text_poke_early version), but disappears when we disable interrupts for
| a longer period of this, I suspect a problem with irq disabling in
| Virtualbox.
|
| We could try to add some nsleep() or msleep() calls within text_poke and
| text_poke_early before and after the code modificatoin to see if the
| problem disappears. If it does, then that would somewhat confirm the
| racy irq disable thesis.

Well, a Ubuntu kernel guy has reported in the virtualbox's ticket[1]
that the oops doesn't happen if he puts a printk() in the crash site.

The funny thing is that someone (who might be a virtualbox developer)
used the same race argument to say that this is a bug in the kernel.

What concerns me though is that how can virtualbox be worth using
in the Linux community if it's probably not working for various distros
(currently Fedora, Ubuntu, Mandriva and ArchLinux).

Thanks for the effort, guys.

[1] http://www.virtualbox.org/ticket/1875

--
Luiz Fernando N. Capitulino
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/