RE: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE

From: PLATTNER Christoph
Date: Mon Feb 01 2021 - 07:44:20 EST


Thank you very much, I appreciate your fast responses.
Thank you also for clarification, I did completely oversee
the permission settings in the segment setup and expected
the fault reaction on the PP bits in the TLB.
And I will re-read the chapters, got get deeper into this topic.

Greetings
Christoph


-----Original Message-----
From: Christophe Leroy <christophe.leroy@xxxxxxxxxx>
Sent: Montag, 1. Februar 2021 12:39
To: PLATTNER Christoph <christoph.plattner@xxxxxxxxxxxxxxx>; Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>; Paul Mackerras <paulus@xxxxxxxxx>; Michael Ellerman <mpe@xxxxxxxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx; linuxppc-dev@xxxxxxxxxxxxxxxx; HAMETNER Reinhard <reinhard.hametner@xxxxxxxxxxxxxxx>; REITHER Robert - Contractor <robert.reither@xxxxxxxxxxxxxxxxxxxxxxxx>; KOENIG Werner <werner.koenig@xxxxxxxxxxxxxxx>
Subject: Re: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE



Le 01/02/2021 à 11:22, PLATTNER Christoph a écrit :
> Hello to all, and thank you very much for first and second fast response.
>
> I do not have a long history on PowerPC MMU environment, I hacked into
> this topic for about 3 months for analyzing that problem- so, sorry, if I am wrong in some points ...

Yes you are wrong on some points, sorry, see below.


>
> What I learn so far from this MPC5121e (variant of e300c4 core):
> - It uses book3s32 hash-code, but it DOES NOT provide KEY hash method, so always the
> branch "if (! Hash) ...." is taken, so, I assume that "key 0" and "key 1" setups are not
> used on this CPU (not supporting MMU_FTR_HPTE_TABLE)

hash method is not used, this is SW TLB loading that is used, but still, all the PP and Ks/Kp keys defined in the segment register are used, see e300 core reference manual §6.4.2 Page Memory Protection

> - The PP bits are NOT checked by the CPU in HW, even if set to 00, the CPU does not react.
> As far I have understood, the TLB miss routines are responsible for checking permissions.
> The TLB miss routines check the Linux PTE styled entries and generates the PP bits
> for the TLB entry. The PowerPC PP bits are never check elsewhere on that CPU models ...

PP bits ARE checked hoppefully. If it was not the case, then the TLB miss routines would install a TLB on a read, then the user could do a write without any verification being done ?

Refer to e300 Core reference Manual, §6.1.4 Memory Protection Facilities

As I explained in the patch, the problem is not that the HW doesn't check the permission. It is that user accessed been done with key 0 as programmed in the segment registers, PP 00 means RW access.

> - The PTE entries in Linux are fully "void" in sense of this CPU type, as this CPU does not
> read any PTEs from RAM (no HW support in contrast to x86 or ARM or later ppc...).

No, the PTE are read by the TLB miss exception handlers and writen into TLB entries.

>
> In summary - as far as I understand it now - we have to handle the PTE
> bits differently (Linux style) for PROT_NONE permissions - OR - we
> have to expand the permission checking like my proposed experimental
> patch. (PROT_NONE is not NUMA related only, but may not used very often ...).

Yes, expanding the permission checking is the easiest solution, hence the patch I sent out based on your proposal.

>
> Another related point:
> According e300 RM (manual) the ACCESSED bit in the PTE shall be set on
> TLB miss, as it is an indication, that page is used. In 4.4 kernel
> this write back of the _PAGE_ACCESSED bit was performed after successful permission check:
>
> bne- DataAddressInvalid /* return if access not permitted */
> ori r0,r0,_PAGE_ACCESSED /* set _PAGE_ACCESSED in pte */
> /*
> * NOTE! We are assuming this is not an SMP system, otherwise
> * we would need to update the pte atomically with lwarx/stwcx.
> */
> stw r0,0(r2) /* update PTE (accessed bit) */
> /* Convert linux-style PTE to low word of PPC-style PTE */
>
> Bit is set (ori ...) and written back (stw ...) to Linux PTE. May be,
> this is not needed, as the PTE is never seen by the PPC chip. But I do
> not understand, WHY the PAGE_ACCCESSED is used for permission check in the late 5.4 kernel (not used in 4.4 kernel):
>
> cmplw 0,r1,r3
> mfspr r2, SPRN_SDR1
> li r1, _PAGE_PRESENT | _PAGE_ACCESSED
> rlwinm r2, r2, 28, 0xfffff000
> bgt- 112f
>
> What is the reason or relevance for checking this here ?
> Was not checked in 4.4, bit or-ed afterwards, as it is accessed now.
> Do you know the reason of change on this point ?

PAGE_ACCESSED is important for memory management, linux kernel need it.

But instead of spending time at every miss to perform a write which will be a no-op in 99% of cases, we prefer bailing out to the page_fault logic when the accessed bit is not set. Then the page_fault logic will set the bit.
This also allowed to simplify the handling in __set_pte()_at function by avoiding races in the update of PTEs.

>
> Another remark to Core manual relevant for this:
> There is the reference manual for e300 core available (e300 RM). It includes
> many remarks in range of Memory Management section, that many features
> are optional or variable for dedicated implementations. On the other hand,
> the MPC5121e reference manual refers to the e300 core RM, but DOES NOT
> information, which of the optional points are there or nor. According my
> analysis, MPC5121e does not include any of the optional features.
>

Not sure what you mean. As far as I understand, that chapter tells you that some functionnalities
are optional for the powerpc architectecture, and provided (or not) by the e300 core. The MPC5121
supports all the things that are defined by e300 core.


>
> Thanks a lot for first reactions

You are welcome, don't hesitate if you have additional questions.

Christophe