Re: [PATCH 04/27] Documentation: x86: convert exception-tables.txt to reST

From: Changbin Du
Date: Wed May 01 2019 - 23:19:44 EST


On Sat, Apr 27, 2019 at 11:48:26AM -0300, Mauro Carvalho Chehab wrote:
> Em Fri, 26 Apr 2019 23:31:27 +0800
> Changbin Du <changbin.du@xxxxxxxxx> escreveu:
>
> > This converts the plain text documentation to reStructuredText format and
> > add it to Sphinx TOC tree. No essential content change.
> >
> > Signed-off-by: Changbin Du <changbin.du@xxxxxxxxx>
> > ---
> > ...eption-tables.txt => exception-tables.rst} | 231 ++++++++++--------
> > Documentation/x86/index.rst | 1 +
> > 2 files changed, 126 insertions(+), 106 deletions(-)
> > rename Documentation/x86/{exception-tables.txt => exception-tables.rst} (67%)
> >
> > diff --git a/Documentation/x86/exception-tables.txt b/Documentation/x86/exception-tables.rst
> > similarity index 67%
> > rename from Documentation/x86/exception-tables.txt
> > rename to Documentation/x86/exception-tables.rst
> > index e396bcd8d830..2ffb096c8b58 100644
> > --- a/Documentation/x86/exception-tables.txt
> > +++ b/Documentation/x86/exception-tables.rst
> > @@ -1,5 +1,10 @@
> > - Kernel level exception handling in Linux
> > - Commentary by Joerg Pommnitz <joerg@xxxxxxxxxxxxxxx>
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +===============================
> > +Kernel level exception handling
> > +===============================
> > +
> > +Commentary by Joerg Pommnitz <joerg@xxxxxxxxxxxxxxx>
> >
> > When a process runs in kernel mode, it often has to access user
> > mode memory whose address has been passed by an untrusted program.
> > @@ -25,9 +30,9 @@ How does this work?
> >
> > Whenever the kernel tries to access an address that is currently not
> > accessible, the CPU generates a page fault exception and calls the
> > -page fault handler
> > +page fault handler::
> >
> > -void do_page_fault(struct pt_regs *regs, unsigned long error_code)
> > + void do_page_fault(struct pt_regs *regs, unsigned long error_code)
> >
> > in arch/x86/mm/fault.c. The parameters on the stack are set up by
> > the low level assembly glue in arch/x86/kernel/entry_32.S. The parameter
> > @@ -57,73 +62,74 @@ as an example. The definition is somewhat hard to follow, so let's peek at
> > the code generated by the preprocessor and the compiler. I selected
> > the get_user call in drivers/char/sysrq.c for a detailed examination.
> >
> > -The original code in sysrq.c line 587:
> > +The original code in sysrq.c line 587::
> > +
> > get_user(c, buf);
> >
> > -The preprocessor output (edited to become somewhat readable):
> > -
> > -(
> > - {
> > - long __gu_err = - 14 , __gu_val = 0;
> > - const __typeof__(*( ( buf ) )) *__gu_addr = ((buf));
> > - if (((((0 + current_set[0])->tss.segment) == 0x18 ) ||
> > - (((sizeof(*(buf))) <= 0xC0000000UL) &&
> > - ((unsigned long)(__gu_addr ) <= 0xC0000000UL - (sizeof(*(buf)))))))
> > - do {
> > - __gu_err = 0;
> > - switch ((sizeof(*(buf)))) {
> > - case 1:
> > - __asm__ __volatile__(
> > - "1: mov" "b" " %2,%" "b" "1\n"
> > - "2:\n"
> > - ".section .fixup,\"ax\"\n"
> > - "3: movl %3,%0\n"
> > - " xor" "b" " %" "b" "1,%" "b" "1\n"
> > - " jmp 2b\n"
> > - ".section __ex_table,\"a\"\n"
> > - " .align 4\n"
> > - " .long 1b,3b\n"
> > - ".text" : "=r"(__gu_err), "=q" (__gu_val): "m"((*(struct __large_struct *)
> > - ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err )) ;
> > - break;
> > - case 2:
> > - __asm__ __volatile__(
> > - "1: mov" "w" " %2,%" "w" "1\n"
> > - "2:\n"
> > - ".section .fixup,\"ax\"\n"
> > - "3: movl %3,%0\n"
> > - " xor" "w" " %" "w" "1,%" "w" "1\n"
> > - " jmp 2b\n"
> > - ".section __ex_table,\"a\"\n"
> > - " .align 4\n"
> > - " .long 1b,3b\n"
> > - ".text" : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *)
> > - ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err ));
> > - break;
> > - case 4:
> > - __asm__ __volatile__(
> > - "1: mov" "l" " %2,%" "" "1\n"
> > - "2:\n"
> > - ".section .fixup,\"ax\"\n"
> > - "3: movl %3,%0\n"
> > - " xor" "l" " %" "" "1,%" "" "1\n"
> > - " jmp 2b\n"
> > - ".section __ex_table,\"a\"\n"
> > - " .align 4\n" " .long 1b,3b\n"
> > - ".text" : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *)
> > - ( __gu_addr )) ), "i"(- 14 ), "0"(__gu_err));
> > - break;
> > - default:
> > - (__gu_val) = __get_user_bad();
> > - }
> > - } while (0) ;
> > - ((c)) = (__typeof__(*((buf))))__gu_val;
> > - __gu_err;
> > - }
> > -);
> > +The preprocessor output (edited to become somewhat readable)::
> > +
> > + (
> > + {
> > + long __gu_err = - 14 , __gu_val = 0;
> > + const __typeof__(*( ( buf ) )) *__gu_addr = ((buf));
> > + if (((((0 + current_set[0])->tss.segment) == 0x18 ) ||
> > + (((sizeof(*(buf))) <= 0xC0000000UL) &&
> > + ((unsigned long)(__gu_addr ) <= 0xC0000000UL - (sizeof(*(buf)))))))
> > + do {
> > + __gu_err = 0;
> > + switch ((sizeof(*(buf)))) {
> > + case 1:
> > + __asm__ __volatile__(
> > + "1: mov" "b" " %2,%" "b" "1\n"
> > + "2:\n"
> > + ".section .fixup,\"ax\"\n"
> > + "3: movl %3,%0\n"
> > + " xor" "b" " %" "b" "1,%" "b" "1\n"
> > + " jmp 2b\n"
> > + ".section __ex_table,\"a\"\n"
> > + " .align 4\n"
> > + " .long 1b,3b\n"
> > + ".text" : "=r"(__gu_err), "=q" (__gu_val): "m"((*(struct __large_struct *)
> > + ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err )) ;
> > + break;
> > + case 2:
> > + __asm__ __volatile__(
> > + "1: mov" "w" " %2,%" "w" "1\n"
> > + "2:\n"
> > + ".section .fixup,\"ax\"\n"
> > + "3: movl %3,%0\n"
> > + " xor" "w" " %" "w" "1,%" "w" "1\n"
> > + " jmp 2b\n"
> > + ".section __ex_table,\"a\"\n"
> > + " .align 4\n"
> > + " .long 1b,3b\n"
> > + ".text" : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *)
> > + ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err ));
> > + break;
> > + case 4:
> > + __asm__ __volatile__(
> > + "1: mov" "l" " %2,%" "" "1\n"
> > + "2:\n"
> > + ".section .fixup,\"ax\"\n"
> > + "3: movl %3,%0\n"
> > + " xor" "l" " %" "" "1,%" "" "1\n"
> > + " jmp 2b\n"
> > + ".section __ex_table,\"a\"\n"
> > + " .align 4\n" " .long 1b,3b\n"
> > + ".text" : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *)
> > + ( __gu_addr )) ), "i"(- 14 ), "0"(__gu_err));
> > + break;
> > + default:
> > + (__gu_val) = __get_user_bad();
> > + }
> > + } while (0) ;
> > + ((c)) = (__typeof__(*((buf))))__gu_val;
> > + __gu_err;
> > + }
> > + );
> >
> > WOW! Black GCC/assembly magic. This is impossible to follow, so let's
> > -see what code gcc generates:
> > +see what code gcc generates::
> >
> > > xorl %edx,%edx
> > > movl current_set,%eax
> > @@ -154,7 +160,7 @@ understand. Can we? The actual user access is quite obvious. Thanks
> > to the unified address space we can just access the address in user
> > memory. But what does the .section stuff do?????
> >
> > -To understand this we have to look at the final kernel:
> > +To understand this we have to look at the final kernel::
> >
> > > objdump --section-headers vmlinux
> > >
> > @@ -181,7 +187,7 @@ To understand this we have to look at the final kernel:
> >
> > There are obviously 2 non standard ELF sections in the generated object
> > file. But first we want to find out what happened to our code in the
> > -final kernel executable:
> > +final kernel executable::
> >
> > > objdump --disassemble --section=.text vmlinux
> > >
> > @@ -199,7 +205,7 @@ final kernel executable:
> > The whole user memory access is reduced to 10 x86 machine instructions.
> > The instructions bracketed in the .section directives are no longer
> > in the normal execution path. They are located in a different section
> > -of the executable file:
> > +of the executable file::
> >
> > > objdump --disassemble --section=.fixup vmlinux
> > >
> > @@ -207,14 +213,15 @@ of the executable file:
> > > c0199ffa <.fixup+10ba> xorb %dl,%dl
> > > c0199ffc <.fixup+10bc> jmp c017e7a7 <do_con_write+e3>
> >
> > -And finally:
> > +And finally::
> > +
> > > objdump --full-contents --section=__ex_table vmlinux
> > >
> > > c01aa7c4 93c017c0 e09f19c0 97c017c0 99c017c0 ................
> > > c01aa7d4 f6c217c0 e99f19c0 a5e717c0 f59f19c0 ................
> > > c01aa7e4 080a18c0 01a019c0 0a0a18c0 04a019c0 ................
> >
> > -or in human readable byte order:
> > +or in human readable byte order::
> >
> > > c01aa7c4 c017c093 c0199fe0 c017c097 c017c099 ................
> > > c01aa7d4 c017c2f6 c0199fe9 c017e7a5 c0199ff5 ................
> > @@ -222,18 +229,22 @@ or in human readable byte order:
> > this is the interesting part!
> > > c01aa7e4 c0180a08 c019a001 c0180a0a c019a004 ................
> >
> > -What happened? The assembly directives
> > +What happened? The assembly directives::
> >
> > -.section .fixup,"ax"
> > -.section __ex_table,"a"
> > + .section .fixup,"ax"
> > + .section __ex_table,"a"
> >
> > told the assembler to move the following code to the specified
> > -sections in the ELF object file. So the instructions
> > -3: movl $-14,%eax
> > - xorb %dl,%dl
> > - jmp 2b
> > -ended up in the .fixup section of the object file and the addresses
> > +sections in the ELF object file. So the instructions::
> > +
> > + 3: movl $-14,%eax
> > + xorb %dl,%dl
> > + jmp 2b
> > +
> > +ended up in the .fixup section of the object file and the addresses::
> > +
> > .long 1b,3b
> > +
> > ended up in the __ex_table section of the object file. 1b and 3b
> > are local labels. The local label 1b (1b stands for next label 1
> > backward) is the address of the instruction that might fault, i.e.
> > @@ -246,35 +257,39 @@ the fault, in our case the actual value is c0199ff5:
> > the original assembly code: > 3: movl $-14,%eax
> > and linked in vmlinux : > c0199ff5 <.fixup+10b5> movl $0xfffffff2,%eax
> >
> > -The assembly code
> > +The assembly code::
> > +
> > > .section __ex_table,"a"
> > > .align 4
> > > .long 1b,3b
> >
> > -becomes the value pair
> > +becomes the value pair::
> > +
> > > c01aa7d4 c017c2f6 c0199fe9 c017e7a5 c0199ff5 ................
> > ^this is ^this is
> > 1b 3b
> > +
> > c017e7a5,c0199ff5 in the exception table of the kernel.
> >
> > So, what actually happens if a fault from kernel mode with no suitable
> > vma occurs?
> >
> > -1.) access to invalid address:
> > - > c017e7a5 <do_con_write+e1> movb (%ebx),%dl
> > -2.) MMU generates exception
> > -3.) CPU calls do_page_fault
> > -4.) do page fault calls search_exception_table (regs->eip == c017e7a5);
> > -5.) search_exception_table looks up the address c017e7a5 in the
> > - exception table (i.e. the contents of the ELF section __ex_table)
> > - and returns the address of the associated fault handle code c0199ff5.
> > -6.) do_page_fault modifies its own return address to point to the fault
> > - handle code and returns.
> > -7.) execution continues in the fault handling code.
> > -8.) 8a) EAX becomes -EFAULT (== -14)
> > - 8b) DL becomes zero (the value we "read" from user space)
> > - 8c) execution continues at local label 2 (address of the
> > - instruction immediately after the faulting user access).
> > +#. access to invalid address::
> > +
> > + > c017e7a5 <do_con_write+e1> movb (%ebx),%dl
> > +#. MMU generates exception
> > +#. CPU calls do_page_fault
> > +#. do page fault calls search_exception_table (regs->eip == c017e7a5);
> > +#. search_exception_table looks up the address c017e7a5 in the
> > + exception table (i.e. the contents of the ELF section __ex_table)
> > + and returns the address of the associated fault handle code c0199ff5.
> > +#. do_page_fault modifies its own return address to point to the fault
> > + handle code and returns.
> > +#. execution continues in the fault handling code.
> > +#. a) EAX becomes -EFAULT (== -14)
> > + b) DL becomes zero (the value we "read" from user space)
> > + c) execution continues at local label 2 (address of the
> > + instruction immediately after the faulting user access).
> >
> > The steps 8a to 8c in a certain way emulate the faulting instruction.
> >
> > @@ -295,14 +310,15 @@ Things changed when 64-bit support was added to x86 Linux. Rather than
> > double the size of the exception table by expanding the two entries
> > from 32-bits to 64 bits, a clever trick was used to store addresses
> > as relative offsets from the table itself. The assembly code changed
> > -from:
> > - .long 1b,3b
> > -to:
> > - .long (from) - .
> > - .long (to) - .
> > +from::
> > +
> > + .long 1b,3b
> > + to:
> > + .long (from) - .
> > + .long (to) - .
> >
> > and the C-code that uses these values converts back to absolute addresses
> > -like this:
> > +like this::
> >
> > ex_insn_addr(const struct exception_table_entry *x)
> > {
> > @@ -313,15 +329,18 @@ In v4.6 the exception table entry was expanded with a new field "handler".
> > This is also 32-bits wide and contains a third relative function
> > pointer which points to one of:
> >
> > -1) int ex_handler_default(const struct exception_table_entry *fixup)
> > +1) `int ex_handler_default(const struct exception_table_entry *fixup)`
> > This is legacy case that just jumps to the fixup code
>
> You should like change the indentation, or add an extra line, as otherwise,
> it will be shown as:
>
> 1. int ex_handler_default(const struct exception_table_entry *fixup) This is legacy case that just jumps to the fixup code
> ...
>
> I would do, instead:
>
> 1) ``int ex_handler_default(const struct exception_table_entry *fixup)``
> This is legacy case that just jumps to the fixup code
>
> With would make the function name monospaced and bold, and place the
> function explanation at the next line.
>
> Same is valid for (2) and (3) below.
>
> With such change:
>
> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@xxxxxxxxxx>
>
Fixed all. Thanks.

>
> > -2) int ex_handler_fault(const struct exception_table_entry *fixup)
> > +
> > +2) `int ex_handler_fault(const struct exception_table_entry *fixup)`
> > This case provides the fault number of the trap that occurred at
> > entry->insn. It is used to distinguish page faults from machine
> > check.
> > -3) int ex_handler_ext(const struct exception_table_entry *fixup)
> > +
> > +3) `int ex_handler_ext(const struct exception_table_entry *fixup)`
> > This case is used for uaccess_err ... we need to set a flag
> > in the task structure. Before the handler functions existed this
> > case was handled by adding a large offset to the fixup to tag
> > it as special.
>
> > +
> > More functions can easily be added.
> > diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
> > index 2033791e53bc..c0bfd0bd6000 100644
> > --- a/Documentation/x86/index.rst
> > +++ b/Documentation/x86/index.rst
> > @@ -10,3 +10,4 @@ Linux x86 Support
> >
> > boot
> > topology
> > + exception-tables
>
>
>
> Thanks,
> Mauro

--
Cheers,
Changbin Du