Re: [RFC PATCH v2] Documentation/page_tables: Add info about MMU/TLB and Page Faults

From: Fabio M. De Francesco
Date: Mon Jul 24 2023 - 07:21:51 EST


On lunedì 24 luglio 2023 11:55:05 CEST Jonathan Cameron wrote:
> On Sun, 23 Jul 2023 14:07:09 +0200
>
> "Fabio M. De Francesco" <fmdefrancesco@xxxxxxxxx> wrote:
> > Extend page_tables.rst by adding a section about the role of MMU and TLB
> > in translating between virtual addresses and physical page frames.
> > Furthermore explain the concept behind Page Faults and how the Linux
> > kernel handles TLB misses. Finally briefly explain how and why to disable
> > the page faults handler.
> >
> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > Cc: Bagas Sanjaya <bagasdotme@xxxxxxxxx>
> > Cc: Ira Weiny <ira.weiny@xxxxxxxxx>
> > Cc: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> > Cc: Jonathan Corbet <corbet@xxxxxxx>
> > Cc: Linus Walleij <linus.walleij@xxxxxxxxxx>
> > Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
> > Cc: Mike Rapoport <rppt@xxxxxxxxxx>
> > Cc: Randy Dunlap <rdunlap@xxxxxxxxxxxxx>
> > Signed-off-by: Fabio M. De Francesco <fmdefrancesco@xxxxxxxxx>
>
> Hi Fabio,
>

Hi Jonathan,

> Some superficial comments...

Maybe that they are "superficial", BTW they are indeed very welcome :-)

> > ---
> >
> > v1->v2: Add further information about lower level functions in the page
> > fault handler and add information about how and why to disable / enable
> > the page fault handler (provided a link to a Ira's patch that make use
> > of pagefault_disable() to prevent deadlocks).
> >
> > This is an RFC PATCH because of two reasons:
> >
> > 1) I've heard that there is consensus about the need to revise and
> > extend the MM documentation, but I'm not sure about whether or not
> > developers need these kind of introductory information.
> >
> > 2) While preparing this little patch I decided to take a quicj look at
>
> Spell check your intro text.

Sure, I'll s/quicj/quick/

> > the code and found out it currently is not how I thought I remembered
> > it. I'm especially speaking about the x86 case. I'm not sure that I've
> > been able to properly understand what I described as a difference in
> > workflow compared to most of the other architecture.
> >
> > Therefore, for the two reasons explained above, I'd like to hear from
> > people actively involved in MM. If this is not what you want, feel free
> > to throw it away. Otherwise I'd be happy to write more on this and other
> > MM topics. I'm looking forward for comments on this small work.
> >
> > Documentation/mm/page_tables.rst | 87 ++++++++++++++++++++++++++++++++
> > 1 file changed, 87 insertions(+)
> >
> > diff --git a/Documentation/mm/page_tables.rst
> > b/Documentation/mm/page_tables.rst index 7840c1891751..2be56f50c88f 100644
> > --- a/Documentation/mm/page_tables.rst
> > +++ b/Documentation/mm/page_tables.rst
> > @@ -152,3 +152,90 @@ Page table handling code that wishes to be
> > architecture-neutral, such as the>
> > virtual memory manager, will need to be written so that it traverses all
of
> > the currently five levels. This style should also be preferred for
> > architecture-specific code, so as to be robust to future changes.
> >
> > +
> > +
> > +MMU, TLB, and Page Faults
> > +=========================
> > +
> > +The Memory Management Unit (MMU) is a hardware component that handles
> > virtual to +physical address translations. It uses a relatively small
cache
> > in hardware
> It may use a relatively...
> (I doubt Linux supports anything that doesn't have a TLB but they aren't
> required by some architectures - just a performance optimization that you
> 'can' add to an implementation.)

Oh, I didn't know that Linux supports non-MMU architectures. However I suspect
that the vast majority have MMU and TLB. Is it correct?

I'll change the statement to "it may use, and in the vast majority of
supported architecture it indeed uses [...]". How about this? Is it not yet
what you meant?

> > +called the Translation Lookaside Buffer (TLB) to speed up these
> > translations. +When a process wants to access a memory location, the CPU
> > provides a virtual +address to the MMU, which then uses the TLB to quickly
> > find the corresponding +physical address.
> > +
> > +However, sometimes the MMU can't find a valid translation in the TLB.
This
> > +could be because the process is trying to access a range of memory that
> > it's not +allowed to, or because the memory hasn't been loaded into RAM
> > yet.
>
> It might not find it because this is first attempt to do the translation and
> the MMU hasn't filled the TLB entry yet, or a capacity eviction has
happened.

I thought that "[...] hasn't been loaded into RAM yet" would have covered all
cases comprising lazy allocation, copy-n-write, and swapped out pages. I
talked about the first two later in the text, but I forgot to speak about
swapped out page frames to persistent storage so I'll add it in the next
version.

However, I am thinking that is not the TLB misses that may cause an exception
to fault in memory, but it is the MMU itself if not able to fill the TLB with
the content of allocated page tables. If you confirm so, I'll need to rewrite
the first introductory paragraphs. Can you please confirm?

> Basically failure to find it in the TLB doesn't mean we get a page fault
> (unless you are on an ancient architecture where TLB entries are software
> filled which is definitely not the case for most modern ones).

Let me summarize so that you can confirm or deny whether or not I
understood...

1) TLB misses don't cause page faults unless MMU is not able to find the
entries in the hierarchy of page tables. If it finds the entries is
transparently refills the TLB buffer with the found translations.

2) Page faults happens only if MMU, after walking the hierarchy, cannot yet
find any suitable translation.

> > When this
> >
> > +happens, the MMU triggers a page fault, which is a type of interrupt that
>
> Hmm. Whilst similar to an interrupt I'd argue that it's not one..

3) I shouldn't define it as an "interrupt" because it technically is not. How
about "exception" or "software exception"?

> > +signals the CPU to pause the current process and run a special function
to
> > +handle the fault.
>
> ...
>
> Jonathan

I don't read any other comments on the second part of the RFC. Does it mean
that the second part is OK from your POV?

It would be of great help if you could set aside some more minutes and clear
the doubts I just expressed and answer the questions I asked :-)

Thanks for the comments,

Fabio