Re: [PATCH] ARC: prevent showing irrelevant exception info in signal message

From: Eugeniy Paltsev
Date: Fri Jul 06 2018 - 09:05:20 EST


Hi Vineet,

On Thu, 2018-07-05 at 14:26 -0700, Vineet Gupta wrote:
> On 07/03/2018 03:57 AM, Eugeniy Paltsev wrote:
> > On Mon, 2018-07-02 at 10:57 -0700, Vineet Gupta wrote:
> > > +CC Al
> > >
> > > On 06/29/2018 12:39 PM, Eugeniy Paltsev wrote:
> > > > We process signals in the end of syscall/exception handler.
> > > > It the signal is fatal we print register's content using
> > > > show_regs function. show_regs() also prints information about
> > > > last exception happened.
> > > >
> > > > In case of multicore system we can catch the situation when we
> > > > will print wrong information about exception. See the example:
> > > > ______________________________
> > > > CPU-0: started to handle page fault
> > > > CPU-1: sent signal to process, which is executed on CPU-0
> > > > CPU-0: ended page fault handle. Started to process signal before
> > > > returnig to userspace. Process signal, which is send from
> > > > CPU-0. As th signal is fatal we call show_regs().
> > > > show_regs() will show information about last exception
> > > > which is *page fault* (instead of "trap" which is used for
> > > > signals and happened on CPU-0)
> > > >
> > > > So we will get message like this:
> > > > /home/waitpid02
> > > > potentially unexpected fatal signal 8.
> > > > Path: /home/waitpid02
> > > > CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
> > > > task: 9f11c200 task.stack: 9f3ae000
> > > >
> > > > [ECR ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000123ec
> > > > [EFA ]: 0x00000000
> > > > [BLINK ]: 0x123ea
> > > > [ERET ]: 0x123ec
> > > > @off 0x123ec in [/home/waitpid02]
> > > > VMA: 0x00010000 to 0x00016000
> > > > [STAT32]: 0x80080882 : IE U
> > > > BTA: 0x000123ea SP: 0x5ffd3db0 FP: 0x00000000
> > > > LPS: 0x20031684 LPE: 0x2003169a LPC: 0x00000006
> > > > [-----other-info-----]
> > > >
> > > > This message is confusing because it show information about page fault
> > > > ( [ECR ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
> > > > to signal.
> > >
> > > Agreed this is misleading. @Al, is there a way to identify process termination
> > > from signals because it did something wrong vs. say unhandled signal. For former,
> > > we want to dump additional info in show_regs() such as PC / Fault addres etc and
> > > not in other scenario.
> > >
> > > > This situation was reproduced with waitpid02 LTP test.
> > > > _____________________________
> > > >
> > > > So remove printing information about exceptions from show_regs()
> > > > to avoid confusing messages. Print information about exceptions
> > > > only in required places instead of show_regs()
> > > >
> > > > Now we don't print information about exceptions if signal is simply
> > > > send by another userspace app. So in case of waitpid02 we will print
> > > > next message:
> > > > _____________________________
> > > > ./waitpid02
> > > > potentially unexpected fatal signal 8.
> > > > [STAT32]: 0x80080082 : IE U
> > > > BTA: 0x20000fc4 SP: 0x5ff8bd64 FP: 0x00000000
> > > > LPS: 0x200524a0 LPE: 0x200524b6 LPC: 0x00000006
> > > > [-----other-info-----]
> > > > _____________________________
> > >
> > > The prints I'm seeing now, for a segv from NULL pointer access is even more
> > > confusing !
> > > There's a mixup of prints....
> > >
> > > -------------------->8--------------------
> > > Path: /segv
> > > CPU: 0 PID: 70 Comm: segv Not tainted 4.17.0+ #412
> > >
> > > [ECR ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000103ac
> > > [EFA ]: 0x00000000
> > > [BLINK ]: 0x20047bb0
> > > [ERET ]: 0x103ac
> > > @off 0x103ac in [/segv]
> > > VMA: 0x00010000 to 0x00012000
> > >
> > > potentially unexpected fatal signal 11.
> > > [STAT32]: 0x80080882 : IE U
> > > BTA: 0x00010398 SP: 0x5fc95e1c FP: 0x5fc95e20
> > > LPS: 0x20039ffc LPE: 0x2003a000 LPC: 0x00000000
> > > r00: 0x00000001 r01: 0x5fc95e94 r02: 0x00000000
> > > r03: 0x00000064 r04: 0x80808080 r05: 0x2f2f2f2f
> > > ...
> > > -------------------->8--------------------
> > >
> > > and for the process killed by signal 8, we get below.
> > >
> > > -------------------->8--------------------
> > > [ARCLinux]# kill -8 71
> > > [ARCLinux]# potentially unexpected fatal signal 8.
> > > [STAT32]: 0x80080882 : IE U
> > > BTA: 0x20020660 SP: 0x5fbcddec FP: 0x5fbcde1c
> > > LPS: 0x20039ffc LPE: 0x2003a000 LPC: 0x00000000
> > > r00: 0xfffffdfc r01: 0x5fbcddf0 r02: 0x00000000
> > > r03: 0x00000008 r04: 0x80808080 r05: 0x2f2f2f2f
> > > r06: 0x7a2f5f4a r07: 0x00000000 r08: 0x00000065
> > > ...
> > >
> > >
> > > [1]+ Floating point exception ./sleep
> > > -------------------->8--------------------
> > > I'm not sure whats the improvement here vs. the status quo.
> >
> > Why do you think this is confusing?
> > The main change is that we don't print exception registers for signal based kill.
>
> For the pure signal based termination, what is the point of printing the rest of
> registers. If you say "it is to give a feel of what the userspace was doing at the
> time...." then we are lacking the most crucial piece which is the PC at the time
> (i.e. ERET placeholder).

We can disable printing the rest of registers by setting "print-fatal-signals" to 0
by default. In that case we will print rest of registers only for exceptions
happened in kernel space (when die() function is called)

So in case of setting "print-fatal-signals" to 0 we will get following messages:

NULL pointer access from user space:
---------->8-------------
# ./arc_hell &
# Path: /root/arc_hell
CPU: 0 PID: 79 Comm: arc_hell Not tainted 4.17.0+ #10

[ECR ]: 0x00050100 => Invalid Read @ 0x00000000 by insn @ 0x2003a35c
[EFA ]: 0x00000000
[BLINK ]: 0x20039ef8
[ERET ]: 0x2003a35c
@off 0x2e35c in [/lib/libuClibc-1.0.18.so]
VMA: 0x2000c000 to 0x20072000

[1]+ Segmentation fault ./arc_hell
---------->8-------------

Process killed by signal 11 via 'kill -s 11':
---------->8-------------
# sleep 1000 &
[1]+ Segmentation fault sleep 1000
---------->8-------------



Probably the best variant is to print only information about unexpected exception
from exception handler and leave printing ECR/EFA/BLINK/ERET in show_regs().

So in case of exception based signal we will print from exception handler
something like that:
---------->8-------------
Unexpected exception: Invalid Read @ 0x00000000 by insn @ 0x2003a35c
---------->8-------------

Other information will be print in show_regs (only if "print-fatal-signals"
is enabled):
---------->8-------------
potentially unexpected fatal signal 11.
Path: /root/arc_hell
CPU: 0 PID: 79 Comm: arc_hell Not tainted 4.17.0+ #10
[ECR ]: 0x00050100
[EFA ]: 0x00000000
[BLINK ]: 0x20039ef8
[ERET ]: 0x2003a35c
@off 0x2e35c in [/lib/libuClibc-1.0.18.so]
VMA: 0x2000c000 to 0x20072000
[STAT32]: 0x80080882 : IE U
BTA: 0x00010398 SP: 0x5fc95e1c FP: 0x5fc95e20
LPS: 0x20039ffc LPE: 0x2003a000 LPC: 0x00000000
r00: 0x00000001 r01: 0x5fc95e94 r02: 0x00000000
r03: 0x00000064 r04: 0x80808080 r05: 0x2f2f2f2f
...
---------->8-------------


> > Moreover, new behavior is more like x86-64 behavior. See the example:
>
> For a null pointer based termination, we now have a ugly looking "potentially
> unexpected..." print in the middle of reg file dump, that is not what x86 does.
> Anyways, that print It is undesirable, but not a deal breaker. The issue is point
> above, can we remedy it.
>
> BTW in your original patch, for a null pointer access, the printing code now
> allocates 2 pages, in each of show_xxx routines and the one in show_regs() is now
> pointless, as it is not used there at all there - so please fix that as well.

Thanks for pointing, will fix in v2 patch.

> -Vineet
>
--
Eugeniy Paltsev