[arjanv@redhat.com: Re: [PATCH] shrink per_cpu_pages to fit 32byte cacheline]

From: Marcelo Tosatti
Date: Thu Sep 23 2004 - 10:46:45 EST

Next message: Randy.Dunlap: "Re: watchdog"
Previous message: Randy.Dunlap: "Re: where can find the early-initcall macro definition."
Next in thread: Giuliano Pochini: "Re: [arjanv@redhat.com: Re: [PATCH] shrink per_cpu_pages to fit32byte cacheline]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Forgot to CC linux-kernel, just in case someone else
can have useful information on this matter.

Andi says any additional overhead will be in the noise
compared to cacheline saving benefit.

***********

Jun,

We need some assistance here - you can probably help us.

Within the Linux kernel we can benefit from changing some fields
of commonly accessed data structures to 16 bit instead of 32 bits,
given that the values for these fields never reach 2 ^ 16.

Arjan warned me, however, that the prefix (in this case "data16") will
cause an additional extra cycle in instruction decoding, per message above.

Can you confirm that please? We can't seem to be able to find
it in Intel's documentation.

By shrinking two fields of "per_cpu_pages" structure we can fit it
in one 32-byte cacheline (<= Pentium III and probably several other
embedded/whatnot architectures will benefit from such a change).

And we just shrank two fields of "struct pagevec" in a similar way
in Andrew's -mm tree.

I'm adding linux-kernel just in case someone else can have
useful comments.

Thanks.

----- Forwarded message from Arjan van de Ven <arjanv@xxxxxxxxxx> -----

From: Arjan van de Ven <arjanv@xxxxxxxxxx>
Date: Tue, 14 Sep 2004 13:13:29 +0200
To: Marcelo Tosatti <marcelo.tosatti@xxxxxxxxxxxx>
Cc: akpm@xxxxxxxx, "Martin J. Bligh" <mbligh@xxxxxxxxxxx>,
linux-mm@xxxxxxxxx
In-Reply-To: <20040914093407.GA23935@xxxxxxxxxx>
Subject: Re: [PATCH] shrink per_cpu_pages to fit 32byte cacheline
Original-Recipient: rfc822;linux-mm@xxxxxxxxx
X-Loop: owner-majordomo@xxxxxxxxx
X-MIMETrack: Itemize by SMTP Server on USMail/Cyclades(Release 6.5.1|January 21, 2004) at
09/14/2004 03:13:02

On Tue, Sep 14, 2004 at 06:34:07AM -0300, Marcelo Tosatti wrote:
> How come short access can cost 1 extra cycle? Because you need two "read bytes" ?

on an x86, a word (2byte) access will cause a prefix byte to the
instruction, that particular prefix byte will take an extra cycle during execution
of the instruction and potentially reduces the parallal decodability of
instructions....

----- End forwarded message -----

----- End forwarded message -----
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Randy.Dunlap: "Re: watchdog"
Previous message: Randy.Dunlap: "Re: where can find the early-initcall macro definition."
Next in thread: Giuliano Pochini: "Re: [arjanv@redhat.com: Re: [PATCH] shrink per_cpu_pages to fit32byte cacheline]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]