Re: [REGRESSION] Boot hang with 939f04bec printk: enable interrupts before calling console_trylock_for_printk()

From: Jan Kara
Date: Wed Jul 23 2014 - 03:43:29 EST


On Wed 23-07-14 01:29:32, Andreas Bombe wrote:
> On Mon, Jul 21, 2014 at 12:04:34PM +0200, Jan Kara wrote:
> > On Sat 19-07-14 00:50:05, Andreas Bombe wrote:
> > > I don't see anything in /sys/kernel/debug/tracing/trace_pipe or
> > > .../trace (besides the header) with your patch applied. In case you
> > > meant to test it with the problematic printk change, I also tried with
> > > the revert reverted. That still hangs as before without any error report
> > > to see.
> > Yes, I meant testing my lockdep patch with the problematic printk change.
> > Thanks for having a look. I'm puzzled why it didn't help.
> >
> > > I checked the kernel logs and there is also no lockdep report anywhere.
> > > I get the "trace_printk() being used" notice but nothing else of
> > > interest around there. Though the notice should mean trace_printk() was
> > > used at least once?
> > Yes. Anyway, I'd be grateful if you could run one more test for me so
> > that I can better understand what's going on. Can you take recent vanilla
> > kernel (with the revert) and apply attached patch to it? It again enables
> > interrupts when calling console_unlock() but keeps lockdep coverage
> > unchanged. It helped Sasha so I want to see whether your case is similar or
> > different. Thanks!
>
> Applied on top of 15ba2236f, works fine.
Great. Thanks for testing. I'll send the patch to Andrew.

> I still don't see what printing could have triggered the problem. The
> only thing that is a warning is from the PCI code about some missing
> pcie-to-pci bridge (which I really should report some time). That isn't
> the culprit however since I tested a build with that WARN_ON_ONCE
> removed and it still hung.
>
> Okay, there's ACPI errors, but these seem to be rather late to matter, I
> think? Anyway, here's the log of this working boot from the start to
> just before initrd gets started:
The prints you can see are those that are fine ;). The thing is that when
lockdep covers more of the printk & console code itself (as was the case
with my original patch), it finds something which makes it crash the
machine. I'm not sure why my patch to make lockdep use trace_printk didn't
really help, maybe the nature of the crash is different than I thought.
Sasha Levin was able to reproduce the problem with a virtual machine so
I'll ask him for a config and will hope I'll be able to reproduce myself
and experiment with it. Thanks again for testing!

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/