Re: [PATCH] ARM: avoid mis-detecting some V7 cores in thedecompressor

From: Nicolas Pitre
Date: Tue Jun 04 2013 - 17:19:08 EST


On Tue, 4 Jun 2013, Stephen Boyd wrote:

> On 06/04, Nicolas Pitre wrote:
> > On Mon, 3 Jun 2013, Stephen Boyd wrote:
> >
> > > On 06/03/13 15:45, Russell King - ARM Linux wrote:
> > > > On Mon, Jun 03, 2013 at 03:37:39PM -0700, Stephen Boyd wrote:
> > > >> In my case I'm booting a kernel with textoffset = 0x208000 but RAM
> > > >> starts at 0x0. Does "minimum of RAM start" mean 0x0 or 0x200000?
> > > > The basic requirement for zImage's is no less than the start of RAM
> > > > plus 32K. Or let me put it another way - start of writable memory
> > > > plus 32K.
> > > >
> > > > Whether you need an offset of 0x200000 or not is not for the
> > > > decompressor to know. If you're having to avoid the first 0x200000
> > > > bytes of memory for some reason (eg, secure firmware or DSP needs
> > > > it left free) then there's no way for the decompressor to know that,
> > > > so it's irrelevant.
> > > >
> > > > So, lets say that your platform has a DSP which needs the first 0x200000
> > > > bytes left free. So the boot loader _already_ needs to know to load
> > > > the image not at zero, but above 0x200000. The additional 32K
> > > > requirement is really nothing new and so should be treated in just the
> > > > same way.
> > > >
> > > > Leave at least 32K of usable memory below the zImage at all times.
> > >
> > > Understood. On my device writeable RAM actually starts at 0x0 but I have
> > > compiled in support for devices which don't have writeable memory at
> > > 0x0, instead they have writeable memory starting at 0x200000. Because I
> > > have a kernel supporting more than one device with differing memory
> > > layouts I run into this problem. The same problem will occur to any
> > > devices in the multi-platform kernel when a device with unwriteable
> > > memory near the bottom (such as MSM8960) joins the multi-platform defconfig.
> > >
> > > Let me try to word it in your example. I have compiled in support for a
> > > platform that has a DSP which needs the first 0x200000 bytes left free.
> > > I have also compiled in support for a platform that doesn't have this
> > > requirement. I plan to run the zImage on the second platform (the one
> > > without the DSP requirement). The bootloader I'm running this zImage on
> > > has no idea that I've compiled in support for the other platform with
> > > the DSP requirement so it assumes it can load the zImage at the start of
> > > RAM (0x0) plus 32K. This is bad because then the page tables get written
> > > into my compressed data and it fails to decompress.
> >
> > I've looked at the code and I think that #1 in your initial options is
> > probably best here. I agree with Russell about #2 being way too complex
> > for only this case.
> >
> > So, right before calling into cache_on, you could test if r4 - 16K >= pc
> > and r4 < pc + (_end - .) then skip cache_on.
> >
> > Something like this untested patch:
>
> So this would cause the decompression to run without the cache on
> if we have to relocate the decompression code to avoid
> overwriting ourselves?

No. What my example patch does is to simply skip setting up a page
table and turning on the cache if the page table ends up in the
code/data to be relocated.

> It seems that the memcpy is fairly quick on my hardware in comparison
> to the decompression so moving the cache_on() call to right before we
> run decompression keeps things pretty fast. It's very possible
> different hardware will have different results.

"Fairly quick" is still not optimal.

> This is what I meant by option #1. I suppose
> we can make it smarter and conditionalize it on if we relocated
> or not?

Here's what I,m suggesting: