Re: [PATCH] tell gcc optimizer to never introduce new data races

From: Jiri Kosina
Date: Tue Jun 10 2014 - 14:11:01 EST


On Tue, 10 Jun 2014, Linus Torvalds wrote:

> > We have been chasing a memory corruption bug, which turned out to be
> > caused by very old gcc (4.3.4), which happily turned conditional load into
> > a non-conditional one, and that broke correctness (the condition was met
> > only if lock was held) and corrupted memory.
>
> Just out of interest, can you point to the particular kernel code that
> caused this? I think that's more interesting than the example program
> you show - which I'm sure is really nice for gcc developers as an
> example, but from a kernel standpoint I think it's more important to
> show the particular problems this caused for the kernel?

Well, as I said, that was with gcc 4.3.4, and GCC people expressed
themselves that that was a slightly different optimization. It made me
nervous enough though to ask whether it's absolutely positively not going
to happen with newer gccs, and the code snippet quoted in the original
mail came back as a response.

The code in question was out-of-tree printk-in-NMI (yeah, surprise
suprise, once again) patch written by Petr Mladek, let me quote his
comment from our internal bugzilla:

===
I have spent few days investigating inconsistent state of kernel ring buffer.
It went out that it was caused by speculative store generated by
gcc-4.3.4.

The problem is in assembly generated for make_free_space(). The functions is
called the following way:

+ vprintk_emit();
+ log = MAIN_LOG; // with logbuf_lock
or
log = NMI_LOG; // with nmi_logbuf_lock
cont_add(log, ...);
+ cont_flush(log, ...);
+ log_store(log, ...);
+ log_make_free_space(log, ...);

If called with log = NMI_LOG then only nmi_log_* global variables are safe to
modify but the generated code does store also into (main_)log_* global
variables:


<log_make_free_space>:
55 push %rbp
89 f6 mov %esi,%esi

48 8b 05 03 99 51 01 mov 0x1519903(%rip),%rax # ffffffff82620868 <nmi_log_next_id>
44 8b 1d ec 98 51 01 mov 0x15198ec(%rip),%r11d # ffffffff82620858 <log_next_idx>
8b 35 36 60 14 01 mov 0x1146036(%rip),%esi # ffffffff8224cfa8 <log_buf_len>
44 8b 35 33 60 14 01 mov 0x1146033(%rip),%r14d # ffffffff8224cfac <nmi_log_buf_len>
4c 8b 2d d0 98 51 01 mov 0x15198d0(%rip),%r13 # ffffffff82620850 <log_next_seq>
4c 8b 25 11 61 14 01 mov 0x1146111(%rip),%r12 # ffffffff8224d098 <log_buf>
49 89 c2 mov %rax,%r10
48 21 c2 and %rax,%rdx
48 8b 1d 0c 99 55 01 mov 0x155990c(%rip),%rbx # ffffffff826608a0 <nmi_log_buf>
49 c1 ea 20 shr $0x20,%r10
48 89 55 d0 mov %rdx,-0x30(%rbp)
44 29 de sub %r11d,%esi
45 29 d6 sub %r10d,%r14d
4c 8b 0d 97 98 51 01 mov 0x1519897(%rip),%r9 # ffffffff82620840 <log_first_seq>
eb 7e jmp ffffffff81107029 <log_make_free_space+0xe9>
[...]
85 ff test %edi,%edi # edi = 1 for NMI_LOG
4c 89 e8 mov %r13,%rax
4c 89 ca mov %r9,%rdx
74 0a je ffffffff8110703d <log_make_free_space+0xfd>
8b 15 27 98 51 01 mov 0x1519827(%rip),%edx # ffffffff82620860 <nmi_log_first_id>
48 8b 45 d0 mov -0x30(%rbp),%rax
48 39 c2 cmp %rax,%rdx # end of loop
0f 84 da 00 00 00 je ffffffff81107120 <log_make_free_space+0x1e0>
[...]
85 ff test %edi,%edi # edi = 1 for NMI_LOG
4c 89 0d 17 97 51 01 mov %r9,0x1519717(%rip) # ffffffff82620840 <log_first_seq>
^^^^^^^^^^^^^^^^^^^^^^^^^^
KABOOOM
74 35 je ffffffff81107160 <log_make_free_space+0x220>


It stores log_first_seq when edi == NMI_LOG. This instructions are used also
when edi == MAIN_LOG but the store is done speculatively before the condition
is decided. It is unsafe because we do not have "logbuf_lock" in NMI context
and some other process migh modify "log_first_seq" in parallel.
===

I believe that the best course of action is both

- building kernel (and anything multi-threaded, I guess) with that
optimization turned off
- persuade gcc folks to change the default for future releases

Thanks,

--
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/