Re: arm64: perf test 26 rpi4 oops

From: Lorenzo Stoakes
Date: Mon Jul 31 2023 - 17:54:28 EST


On Mon, 31 Jul 2023 at 22:08, Lorenzo Stoakes <lstoakes@xxxxxxxxx> wrote:
>
> On Mon, 31 Jul 2023 at 12:52, Will Deacon <will@xxxxxxxxxx> wrote:
> >
> > On Mon, Jul 31, 2023 at 11:43:40AM +0100, Will Deacon wrote:
> > > [+Lorenzo, Kefeng and others]
> > >
> > > On Sun, Jul 30, 2023 at 06:09:15PM +0200, Mike Galbraith wrote:
> > > > On Fri, 2023-07-28 at 15:18 +0100, Will Deacon wrote:
> > > > >
> > > > > Looking at this quickly with Mark, the most likely explanation is that
> > > > > a bogus kernel address is being passed as the source pointer to
> > > > > copy_to_user().
> > > >
> > > > 'start' in read_kcore_iter() is bogus a LOT when running perf test 26,
> > > > and that back to at least 5.15. Seems removal of bogon-proofing gave a
> > > > toothless old bug teeth, but seemingly only to perf test 26. Rummaging
> > > > around with crash vmlinux /proc/kcore seems to be bogon free anyway.
> > > >
> > > > Someone should perhaps take a peek at perf. Bogons aside, it also
> > > > doesn't seem to care deeply about kernel response. Whether the kernel
> > > > oops or I bat 945 bogons aside, it says 'OK'. That seems a tad odd.
> > >
> > > Aha, so I think I triggered the issue you're seeing under QEMU (log
> > > below). perf (unhelpfully) doesn't have stable test numbers, so it's
> > > test 21 in my case. However, it only explodes if I run it as root, since
> > > /proc/kcore is 0400 on my system.
> > >
> > > The easiest way to trigger the problem is simply:
> > >
> > > # objdump -d /proc/kcore
> > >
> > > Looking at the history, I wonder whether this is because of a combination
> > > of:
> > >
> > > e025ab842ec3 ("mm: remove kern_addr_valid() completely")
> > >
> > > which removed the kern_addr_valid() check on the basis that kcore used
> > > copy_from_kernel_nofault() anyway, and:
> > >
> > > 2e1c0170771e ("fs/proc/kcore: avoid bounce buffer for ktext data")
> > >
> > > which replaced the copy_from_kernel_nofault() with _copy_to_user().
> > >
> > > So with both of those applied, we're missing the address check on arm64.
> >
> > Digging into this a little more, the fault occurs because kcore is
> > treating everything from '_text' to '_end' as KCORE_TEXT and expects it
> > to be mapped linearly. However, there's plenty of stuff we _don't_ map
> > in that range on arm64 (e.g. .head.text, the pKVM hypervisor, the entry
> > trampoline) so kcore is broken.
> >
> > One hack is to limit KCORE_TEXT to actually point at the kernel text
> > (see below), but this is a user-visible change in behaviour for things
> > like .data so I think it would be better to restore the old behaviour
> > of handling the faults.
> >
> > Lorenzo?
>
> FYI there is a parallel discussion at
> https://lore.kernel.org/all/ZHc2fm+9daF6cgCE@krava/ :)
>
> [sorry lei isn't playing ball so will have to reply from gmail,
> apologies if this breaks formatting]
>
> It'd be a real pity to have to revert that behaviour, as using a
> bounce buffer is such a hack and means you have to iterate through a
> page at a time...
> Either that or a change such that for KCORE_TEXT specifically we
> reinstate the bounce buffer and use copy_from_kernel_nofault().
>
> It definitely is a bug in kcore to have ranges of memory that are not
> mapped marked as readable. What kind of behaviour changes do you
> anticipate exactly with your prospective change re: .data? The
> fallthroughs?
>
> kcore as a whole needs some love and attention I think.
>
> An alternative is to implement some version of
> copy_from_kernel_nofault() in the iterator code.
>
> However TL;DR - I think we probably do need a semi-revert and to just
> make the ktext do a bounce buffer thing. I definitely want to keep the
> use of iterators so I would really not want to revert anything else.
>
>
> >
> > Will
> >
> > --->8
> >
> > diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
> > index 9cb32e1a78a0..3696a209c1ec 100644
> > --- a/fs/proc/kcore.c
> > +++ b/fs/proc/kcore.c
> > @@ -635,7 +635,7 @@ static struct kcore_list kcore_text;
> > */
> > static void __init proc_kcore_text_init(void)
> > {
> > - kclist_add(&kcore_text, _text, _end - _text, KCORE_TEXT);
> > + kclist_add(&kcore_text, _stext, _etext - _stext, KCORE_TEXT);
> > }
> > #else
> > static void __init proc_kcore_text_init(void)
> >
>
>

Posted a fix at:-

https://lore.kernel.org/all/20230731215021.70911-1-lstoakes@xxxxxxxxx/

Please give that a go and indicate whether that resolves the issue.
This is in effect a partial revert in order to use the
copy_from_kernel_nofault() function to avoid faults on reading
unmapped regions, and sadly necessitates the use of a bounce buffer.

Hopefully this should be something resembling the smallest change we
can do to resolve the problem.