Re: [PATCH] x86, vmlinux.lds: Add debug option to force all data sections aligned

From: Feng Tang
Date: Thu Feb 17 2022 - 02:38:03 EST


On Thu, Feb 17, 2022 at 02:22:32PM +0800, Feng Tang wrote:
> Hi Densy,
>
> Thanks for the review!
>
> On Wed, Feb 16, 2022 at 02:35:10PM +0100, Denys Vlasenko wrote:
> > On 2/16/22 9:28 AM, Feng Tang wrote:
> > > 0day has reported many strange performance changes (regression or
> > > improvement), in which there was no obvious relation between the culprit
> > > commit and the benchmark at the first look, and it causes people to doubt
> > > the test itself is wrong.
> > >
> > > Upon further check, many of these cases are caused by the change to the
> > > alignment of kernel text or data, as whole text/data of kernel are linked
> > > together, change in one domain can affect alignments of other domains.
> > >
> > > To help quickly identifying if the strange performance change is caused
> > > by _data_ alignment, add a debug option to force the data sections from
> > > all .o files aligned on THREAD_SIZE, so that change in one domain won't
> > > affect other modules' data alignment.
> > >
> > > We have used this option to check some strange kernel changes [1][2][3],
> > > and those performance changes were gone after enabling it, which proved
> > > they are data alignment related. Besides these publicly reported cases,
> > > recently there are other similar cases found by 0day, and this option
> > > has been actively used by 0Day for analyzing strange performance changes.
> > ...
> > > + .data : AT(ADDR(.data) - LOAD_OFFSET)
> > > +#ifdef CONFIG_DEBUG_FORCE_DATA_SECTION_ALIGNED
> > > + /* Use the biggest alignment of below sections */
> > > + SUBALIGN(THREAD_SIZE)
> > > +#endif
> >
> > "Align every input section to 4096 bytes" ?
>
> The THREAD_SIZE depends on ARCH, and could be 16KB or more (KASAN related).
>
> > This is way, way, WAY too much. The added padding will be very wasteful.
>
> Yes, its too much. One general kernel's data section could be bumped
> from 18MB to 50+ MB.
>
> And I should make it more clear, that this is for debugging only.
>
> > Performance differences are likely to be caused by cacheline alignment.
> > Factoring in an odd hardware prefetcher grabbing an additional
> > cacheline after every accessed one, I'd say alignment to 128 bytes
> > (on x86) should suffice for almost any scenario. Even 64 bytes
> > would almost always work fine.
>
> Yep, when I started it, I tried 128 bytes for HW adjacent cacheline
> prefetch consideration. But the built kernel won't boot, then I
> tried 512/1024/2048/4096 but there were still boot issue.

I just did these tests again, and confirmed that alignment <= 2048
will not boot on x86 HW, while 4096 and bigger will work.

The reason is this SUBALIGN() has privilege over the alignment
inside the data sections. like in vmlinux.lds:

.data : AT(ADDR(.data) - 0xffffffff80000000)
SUBALIGN(2048)
{
_sdata = .;
. = ALIGN(((1 << 12) << (2 + 0))); __start_init_task = .; init_thread_union = .; init_stack = .; KEEP(*(.data..init_task)) KEEP(*(.data..init_thread_info)) . = __start_init_task + ((1 << 12) << (2 + 0)); __end_init_task = .;
. = ALIGN((1 << 12)); *(.data..page_aligned) . = ALIGN((1 << 12));
...

And the section of *(.data..page_aligned) asks for 4KB align but
our 2048 setting will break it, as confirmed by the System.map:

ffffffff82800000 D _sdata
ffffffff82800000 D init_stack
ffffffff82800000 D init_thread_union
ffffffff82804000 D __end_init_task
ffffffff82804000 d bringup_idt_table
ffffffff82804800 D __vsyscall_page <--------- broken
ffffffff82806000 d hpet

But this also means if necessary, we can change the alignment
from THREAD_SIZE to PAGE_SIZE, which will make data sections
much smaller.

Thanks,
Feng

> So I
> chose the biggest alignment within data sections, considering
> this is purely for analyzing/debugging purpose.