Re: [PATCH 2/3] mm, x86: Remove region_is_ram() call from ioremap

From: Toshi Kani
Date: Mon Jun 22 2015 - 13:23:58 EST


On Mon, 2015-06-22 at 09:21 -0700, Mike Travis wrote:
>
> On 6/19/2015 2:44 PM, Toshi Kani wrote:
> > __ioremap_caller() calls region_is_ram() to look up the resource
> > to check if a target range is RAM, which was added as an additinal
> > check to improve the lookup performance over page_is_ram() (commit
> > 906e36c5c717 "x86: use optimized ioresource lookup in ioremap
> > function").
> >
> > __ioremap_caller() then calls walk_system_ram_range(), which had
> > replaced page_is_ram() to improve the lookup performance (commit
> > c81c8a1eeede "x86, ioremap: Speed up check for RAM pages").
> >
> > Since both functions walk through the resource table, there is
> > no need to call the two functions. Furthermore, region_is_ram()
> > has bugs and always returns with -1. This makes
> > walk_system_ram_range() as the only check being used.
>
> Do you have an example of a failing case?

Well, region_is_ram() fails with -1 for every case... This is because it
breaks the for-loop at 'if (p->end < start)' in the first entry (i.e.
the lowest address range) of the resource table. For example, the first
entry of 'p->end' is 0xfff on my system, which is certainly smaller than
'start'.

# cat /proc/iomem
00000000-00000fff : reserved
00001000-00092fff : System RAM
00093000-00093fff : reserved
:

> Also, I didn't know that
> IOREMAP'd addresses were allowed to be on non-page boundaries?

Yes, that is allowed. Here is a comment in __ioremap_caller().

* NOTE! We need to allow non-page-aligned mappings too: we will
obviously
* have to convert them into an offset in a page-aligned mapping, but
the
* caller shouldn't need to know that small detail.

> Here's the comment and reason for the patches from Patch 0:
>
> <<<
> We have a large university system in the UK that is experiencing
> very long delays modprobing the driver for a specific I/O device.
> The delay is from 8-10 minutes per device and there are 31 devices
> in the system. This 4 to 5 hour delay in starting up those I/O
> devices is very much a burden on the customer.
> ...
> The problem was tracked down to a very slow IOREMAP operation and
> the excessively long ioresource lookup to insure that the user is
> not attempting to ioremap RAM. These patches provide a speed up
> to that function.
> >>>
>
> The speed up was pretty dramatic, I think to about 15-20 minutes
> (the test was done by our local CS person in the UK). I think this
> would prove the function was working since it would have fallen
> back to the previous page_is_ram function and the 4 to 5 hour
> startup.

I wonder how this test was conducted. When the region_is_ram() change
got in, the kernel already had the other speed up change (c81c8a1eeede),
which had replaced page_is_ram().

> If there is a failure, it would be better for all to fix the specific
> bug and not re-introduce the original problem. Perhaps drop to
> page is ram if the address is not page aligned?

walk_system_ram_range() is the one that has the issue with
page-unaligned address. region_is_ram() after fixed by patch 3/3 does
not have this issue. ioremap() does not call page_is_ram() anymore.

pcibios_add_device(), ksysfs.c, kdebugfs.c (and possibly more) call
ioremap for setup_data, which is page-unaligned. If we are going to
disallow such callers, they all need to be converted with a different
mapping interface, such as vmap(). But vmap() takes an array of page
pointers as an argument, and is not convenient for them to use. Since
setup_data is a special range, if they need to be changed may be
arguable. I think issue 3 described in patch 0/3 needs further
discussion. For now, I'd like to fix issue 1 & 2.

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at http://www.tux.org/lkml/