Re: [PATCH v4 4/7] uaccess-buffer: add CONFIG_GENERIC_ENTRY support

From: Thomas Gleixner
Date: Thu Dec 16 2021 - 08:05:47 EST


Peter,

On Wed, Dec 15 2021 at 17:25, Peter Collingbourne wrote:
> On Sat, Dec 11, 2021 at 3:50 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>> This restores the signal blocked mask _after_ exit_to_user_mode_loop()
>> has completed, recalculates pending signals and goes out to user space
>> with eventually pending signals.
>>
>> How is this supposed to be even remotely correct?
>
> Please see this paragraph from the documentation:
>
> When entering the kernel with a non-zero uaccess descriptor
> address for a reason other than a syscall (for example, when
> IPI'd due to an incoming asynchronous signal), any signals other
> than ``SIGKILL`` and ``SIGSTOP`` are masked as if by calling
> ``sigprocmask(SIG_SETMASK, set, NULL)`` where ``set`` has been
> initialized with ``sigfillset(set)``. This is to prevent incoming
> signals from interfering with uaccess logging.
>
> I believe that we will also go out to userspace with pending signals
> when one of the signals that came in was a masked (via sigprocmask)
> asynchronous signal, so this is an expected state.

Believe is not part of a technical analysis, believe belongs into the
realm of religion.

It's a fundamental difference whether the program masks signals itself
or the kernel decides to do that just because.

Pending signals, which are not masked by the process, have to be
delivered _before_ returning to user space.

That's the expected behaviour. Period.

Instrumentation which changes the semantics of the observed code is
broken by definition.

>> So what is this pre/post exit loop code about? Handle something which
>> cannot happen in the first place?
>
> The pre/post_exit_loop() functions are checking UACCESS_BUFFER_ENTRY,
> which is set when prctl(PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR) has been
> used to set the uaccess descriptor address address to a non-zero
> value. It is a different flag from UACCESS_BUFFER_EXIT. It is
> certainly possible for the ENTRY flag to be set in your 2) above,
> since that flag is not normally modified while inside the kernel.

Let me try again. The logger is only active when:

1) PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR has set an address, which
sets UACCESS_BUFFER_ENTRY

2) The task enters the kernel via syscall and the syscall entry
observes UACCESS_BUFFER_ENTRY and sets UACCESS_BUFFER_EXIT

because the log functions only log when UACCESS_BUFFER_EXIT is set.

UACCESS_BUFFER_EXIT is cleared in the syscall exit path _before_ the
exit to usermode loop is reached, which means signal delivery is _NOT_
logged at all.

A non-syscall entry from user space - interrupt, exception, NMI - will
_NOT_ set UACCESS_BUFFER_EXIT because it takes a different entry
path. So when that non-syscall entry returns and delivers a signal then
there is no logging.

When the task has entered the kernel via a syscall and the kernel gets
interrupted and that interruption raises a signal, then there is no
signal delivery. The interrupt returns to kernel mode, which obviously
does not go through exit_to_user_mode(). The raised signal is delivered
when the task returns from the syscall to user mode, but that won't be
logged because UACCESS_BUFFER_EXIT is already cleared before the exit to
user mode loop is reached.

See?

>> then we have a very differrent understanding of what documentation
>> should provide.
>
> This was intended as interface documentation, so it doesn't go into
> too many details. It could certainly be improved though by referencing
> the user documentation, as I mentioned above.

Explanations which are required to make the code understandable have to
be in the code/kernel-doc comments and not in some disjunct place. This
disjunct documentation is guaranteed to be out of date within no time.

Thanks,

tglx