Re: Linux 2.6.39-rc7

From: Konrad Rzeszutek Wilk
Date: Tue May 10 2011 - 19:37:19 EST


On Mon, May 09, 2011 at 11:53:56PM -0400, Konrad Rzeszutek Wilk wrote: > On Mon, May 09, 2011 at 07:49:48PM -0700, Linus Torvalds wrote:
> > So things have been pretty quiet, and unless something major comes up
> > I believe that this will be the last -rc.
>
> Oh no! I was hoping for an extra week!
>
> The patch that I asked to be pulled: (a38647837a411f7df79623128421eef2118b5884)
> "xen/mmu: Add workaround "x86-64, mm: Put early page table high"
>
> does not compleltly workaround the regression that the patch from Yinghai titled
> 'x86-64, mm: Put early page table high" introduced wherein Linux can't boot under Xen.
>
> The failure still encountered: https://lkml.org/lkml/2011/5/5/180, the
> previous git pull request with an outline of the problem https://lkml.org/lkml/2011/5/3/99
> and huge amount of details in http://marc.info/?i=1302607192-21355-2-git-send-email-stefano.stabellini@xxxxxxxxxxxxx
>
> I was hoping that the rc6 could stretch out so that by the time hpa came back from
> his travels he would have had a chance to look at: https://lkml.org/lkml/2011/5/5/226

I had a chance to briefly talk on IRC with hpa and he mentioned I should
send a note to Ingo about this since hpa won't be able to do anything until Friday.

Ingo,
Not sure how familiar you are with this issue, but let me briefly explain it.
Yinghai provided a patch, which calls memblock_find_in_range(), then calls
kernel_physical_mapping_init, which populates the pagetable between pgt_buf_start
and pgt_buf_top and once it is done, calls memblock_x86_reserve_range with pgt_buf_start
and pgt_buf_end (wherein pgt_buf_end <= pgt_buf_top). The memory between pgt_buf_end
and pgt_buf_top can be re-used later on and it is by other subsystems - NUMA for
example uses it.

Under Xen, the pagetables end up being marked RO, so what ends up happening is that
some pages from pgt_buf_end through pgt_buf_top end up RO and the system crashes during
bootup as NUMA subsystem tries to write to that area. The fix is to essentially mark the
area from pgt_buf_end through pgt_buf_top to RW.

Stefano posted a patch, which was Acked by Yinghai, but not so by hpa. The concerns
were that the patch inserts a hook just for this single case and there should be a better
way of doing this - where we either don't need a hook or provide an semantic explanation
of the pagetable building and build the patch from there.

Sadly there was/is not enough time in the 2.6.39 train to actually do it properly.
So I provided another patch (which Linus merged) which crudely tries to mark the area from
pgt_buf_end through pgt_buf_top to RW and all is done within the Xen MMU code. Sadly it
does not work on all machines.

Without a resolution to this, the Linux x86_64 kernel cannot boot under Xen. There are two
options left right now:
a). Revert 4b239f458c229de044d6905c2b0f9fe16ed9e01e (x86-64, mm: Put early page table high)
b). or revert the workaround that Linus merged and pick the one that Stefano came up with.
The patches are available in
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/bug-fixes-for-rc6

They touch the generic x86 MMU code.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/