Re: sysctl_writes_strict documentation + an oddity?

From: Michael Kerrisk (man-pages)
Date: Wed May 20 2015 - 10:09:19 EST


Hello Kees,

Ping on the below!

Cheers,

Michael


On 9 May 2015 at 10:54, Michael Kerrisk (man-pages)
<mtk.manpages@xxxxxxxxx> wrote:
> Hi Kees,
>
> I discovered that you added /proc/sys/kernel/sysctl_writes_strict in
> Linux 3.16. In passing, I'll just mention that was an API change that
> should have been CCed to linux-api@xxxxxxxxxxxxxxxx
>
> Anyway, I've tried to write this file up for the proc(5) man page,
> and I have two requests:
>
> 1) Could you review this text?
> 2) I've found some behavior that surprised me, and I am wondering if it
> is intended. Could you let me know your thoughts?
>
> ===== 1) man-page text =====
>
> The man-page text, heavily based on your text in
> Documentation/sysctl/kernel.txt, is as follows:
>
> /proc/sys/kernel/sysctl_writes_strict (since Linux 3.16)
> The value in this file determines how the file offset
> affects the behavior of updating entries in files under
> /proc/sys. The file has three possible values:
>
> -1 This provides legacy handling, with no printk warnâ
> ings. Each write(2) must fully contain the value to
> be written, and multiple writes on the same file
> descriptor will overwrite the entire value, regardless
> of the file position.
>
> 0 (default) This provides the same behavior as for -1,
> but printk warnings are written for processes that
> perform writes when the file offset is not 0.
>
> 1 Respect the file offset when writing strings into
> /proc/sys files. Multiple writes will append to the
> value buffer. Anything written beyond the maximum
> length of the value buffer will be ignored. Writes to
> numeric /proc/sys entries must always be at file offâ
> set 0 and the value must be fully contained in the
> buffer provided to write(2).
>
> ===== 2) Behavior puzzle (a) =====
>
> The last sentence quoted from the man page was based on your sentence
>
> Writes to numeric sysctl entries must always be at file position 0
> and the value must be fully contained in the buffer sent in the write
> syscall.
>
> So, I had interpreted /proc/sys/kernel/sysctl_writes_strict==1 to
> mean that if one writes into a numeric /proc/sys file at an offset
> other than zero, the write() will fail with some kind of error.
> But this seems not to be the case. Instead, the write() succeeds,
> but the file is left unmodified. That's surprising, I find. So, I'm
> wondering whether the implementation deviates from your intention.
>
> There's a test program below, which takes arguments as follows
>
> ./a.out pathname offset string
>
> And here's a test run that demonstrates the behavior:
>
> $ sudo sh -c "echo 1 > /proc/sys/kernel/sysctl_writes_strict"
> $ cat /proc/sys/kernel/pid_max
> 32768
> $ sudo dmesg --clear
> $ sudo ./a.out /proc/sys/kernel/pid_max 1 3000
> write() succeeded (return value 4)
> $ cat /proc/sys/kernel/pid_max
> 32768
> $ dmesg
>
> As you can see above, an attempt was made to write into the
> /proc/sys/kernel/pid_max file at offset 1.
> The write() returned successfully (reporting 4 bytes written)
> but the file contents were unchanged, and no printk() warning
> was issued. Is this intended behavior?
>
> ===== 2) Behavior puzzle (b) =====
>
> In commit f88083005ab319abba5d0b2e4e997558245493c8, there is this note:
>
> This adds the sysctl kernel.sysctl_writes_strict to control the write
> behavior. The default (0) reports when VFS position is non-0 on a
> write, but retains legacy behavior, -1 disables the warning, and 1
> enables the position-respecting behavior.
>
> The long-term plan here is to wait for userspace to be fixed in response
> to the new warning and to then switch the default kernel behavior to the
> new position-respecting behavior.
>
> (That last para was added to the commit message by AKPM, I see.)
>
> But, I wonder here whether /proc/sys/kernel/sysctl_writes_strict==0
> is going to help with the long-term plan. The problem is that in
> warn_sysctl_write(), pr_warn_once() is used. This means that only
> the first offending user-space application that writes to *any*
> /proc/sys file will generate the printk warning. If that application
> isn't fixed, then none of the other "broken" applications will be
> discovered. It therefore seems possible that it could be a very long
> time before we could "switch the default kernel behavior to the
> new position-respecting behavior".
>
> Looking over old mails
> (http://thread.gmane.org/gmane.linux.kernel/1695177/focus=23240),
> I see that you're aware of the problem, but it seems to me that
> the switch to pr_warn_once() (for fear of spamming the log) likely
> dooms the long-term plan to failure. Your thoughts?
>
> Cheers,
>
> Michael
>
>
> 8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--
>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/types.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <string.h>
>
> #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)
>
> int
> main(int argc, char *argv[])
> {
> char *pathname;
> off_t offset;
> char *string;
> int fd;
> ssize_t numWritten;
>
> if (argc != 4) {
> fprintf(stderr, "Usage: %s pathname offset string\n", argv[0]);
> exit(EXIT_FAILURE);
> }
>
> pathname = argv[1];
> offset = strtoll(argv[2], NULL, 0);
> string = argv[3];
>
> fd = open(pathname, O_RDWR);
> if (fd == -1)
> errExit("open");
>
> if (lseek(fd, offset, SEEK_SET) == -1)
> errExit("lseek");
>
> numWritten = write(fd, string, strlen(string));
> if (numWritten == -1)
> errExit("write");
>
> printf("write() succeeded (return value %zd)\n", numWritten);
>
> exit(EXIT_SUCCESS);
> }
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/



--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/