RE: pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())

From: David Laight
Date: Thu May 30 2019 - 05:38:31 EST


From: Eric Wong
> Sent: 29 May 2019 19:50
...
> > Personally I think that is wrong.
> > Given code like the above that has:
> > while (!interrupted) {
> > pselect(..., &sigint);
> > // process available data
> > }
> >
> > You want the signal handler to be executed even if one of the fds
> > always has available data.
> > Otherwise you can't interrupt a process that is always busy.

FWIW in the past I've seen a process that loops select-accept-fork-exec
get its priority reduced to the point where it never blocked
in select. The client side retries made it go badly wrong!
If it had limited when SIG_INT was detected it would have been
a little more difficult to kill.

> Agreed... I believe cmogstored has always had a bug in the way
> it uses epoll_pwait because it failed to check interrupts if:
>
> a) an FD is ready + interrupt
> b) epoll_pwait returns 0 on interrupt

But the kernel code seems to only call the signal handler
(for signals that are enabled during pselect() (etc)) if
the underlying wait is interrupted.

> The bug remains in userspace for a), which I will fix by adding
> an interrupt check when an FD is ready. The window is very
> small for a) and difficult to trigger, and also in a rare code
> path.
>
> The b) case is the kernel bug introduced in 854a6ed56839a40f
> ("signal: Add restore_user_sigmask()").
>
> I don't think there's any disagreement that b) is a kernel bug.

If the signal is raised after the wait has timed out but before
the signal mask is restored.

> So the confusion is for a), and POSIX is not clear w.r.t. how
> pselect/poll works when there's both FD readiness and an
> interrupt.
>
> Thus I'm inclined to believe *select/*poll/epoll_*wait should
> follow POSIX read() semantics:
>
> If a read() is interrupted by a signal before it reads any data, it shall
> return â1 with errno set to [EINTR].
>
> If a read() is interrupted by a signal after it has successfully read
> some data, it shall return the number of bytes read.

Except that above you want different semantics :-)
For read() any signal handler is always called.
And the return value of a non-blocking read that returns no data
is not affected by any pending signals.

For pselect() that would mean that if the wait timed out (result 0)
and then a signal was raised (before the mask got changed back) then
the return value would be zero and the signal handler would be called.
So your (b) above is not a bug.
Even select() can return 'timeout' and have a signal handler called.

There are really two separate issues:
1) Signals that are pending when the new mask is applied.
2) Signals that are raised after the wait terminates (success or timeout).
If the signal handlers for (2) are not called then they become (1)
next time around the application loop.

Maybe the 'restore sigmask' function should be passed an indication
of whether it is allowed to let signal handler be called and return
whether they would be called (ie whether it restored the signal mask
or left it for the syscall exit code to do after calling the signal
handlers).

That would allow epoll() to convert timeout+pending signal to EINTR,
or to allow all handlers be called regardless of the return value.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)