Re: [PATCH] x86: only put e820 ram entries in resource tree

From: Ingo Molnar
Date: Mon Aug 25 2008 - 03:18:19 EST



* Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:

> Yinghai Lu <yhlu.kernel@xxxxxxxxx> writes:
>
> > may need user to have new kexec tools that could create e820 table
> > from /sys/firmware/memmap instead of /proc/iomem for second kernel
>
> Nacked-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
>
> /proc/iomem is mostly about io resources which you have just removed.
> It is totally the wrong thing to only register RAM resource!

see the RFC commit below for more details - about the problem and
various solutions we are thinking about. The core problem is that the
problem was hard to find and hard to debug - it took the exception
debugging effort of David Witbrodt to track it down.

So we are trying structural fixes to improve the situation. Just
reverting the e820 changes breaks other things and is not the real fix
anyway: the real fix is to increase communication between PC platform
devices/drivers and the PCI code. DMI driven quirks are too limited as
well - more such systems are suspected.

For now we've got the patch below from Yinghai - which hooks directly
into the x86 PCI discovery and reallocation code. While that's already
better than the initial DMI quirk, i think the real fix should go one
level higher, to the resource manager.

i'd rather see the e820 reserved entries show up there (losing system
setup information is almost always a bad idea - and the e820 map is
central enough to be one of the more reliable BIOS-provided data
structures), but with a different resource property: a 'sticky' resource
bit which would cause overlapping PCI devices that already have their
BAR programmed stay there. We already have a certain amount of support
for 'container' resources (bridge resources for example).

That would automatically protect any hpet (or, in theory, ioapic)
platform devices from the PCI code's currently blind resource
reprogramming logic. These platform devices are not PCI enumerated so we
cannot just make the platform drivers themselves be PCI drivers, and
they are special in many regards. (often they are not PCI devices at
all)

Note that this is only about the (BIOS provided) e820 map. The core
problem is, inserting e820 map reserved entries as 'real' resources can
break real devices.

Ingo

---------------->