Re: spin_unlock optimization(i386)

Andrea Arcangeli (andrea@suse.de)
Sun, 28 Nov 1999 22:58:57 +0100 (CET)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Artur Skawina: "Re: gcc and builtin optimizations"
Previous message: Richard B. Johnson: "Re: printk ?"

On Sun, 28 Nov 1999, Gerard Roudier wrote:

>In my opinion, 2 agents that write to the same location (current_state)
>without any synchronisation can only mess up any kind of ordering that
>applies to the view of all agents observing a single agent.

If the second write (to current_state) happens after the first write is
visible to the other CPUs it doesn't matter who wins. The _first_ write
(to locked) has no contentions as there is only one writer.

>Btw, the lock after the double write from thread 2:
>
> "movl $0,%0\n\t"
> "movl $0,%1\n\t"
>--> "lock; addl $0,(%%esp)\n\t" /* flush our write buffers*/
>
>does not fix at all the disordering problem, IMO.

The lock there is not necessary and it's confusing IMHO.

>In my opinion, the only situation that could make sense is to lock for
>the both writes to 'current_state'.

You don't need to do that. If you know that the writes will be visible to
the other CPUs in order then there's no problem.

start with locked = 1 and current_state = 0.

CPU 0 (waiter) CPU 1 (event)
-------------- -------------
movl $0, locked
(0) movl $1, current_state movl $0, current_state
(1) movl locked, %%eax (eax = 0)
(2)

It doesn't matter what gets written to current_state. The two writes on
line (0) can go in parallel. It doesn't matter which CPU wins and what a
`movl current_state, %%ebx` would load into %ebx if run in line (2). The
only thing that matters is that the instruction (1) will load into %%eax 0
and not 1. Currently is loading into %%eax 1 and so it hangs.

This problem could be explained by the speculative read that could happen
this way:

start with locked = 1 and current_state = 0.

CPU 0 (waiter) CPU 1 (event)
-------------- -------------
--> movl locked, %%eax (eax = 1)
| movl $0, locked
| movl $1, current_state movl $0, current_state
---

That would clearly explain the deadlock and why putting a memory barrier
after the write in CPU 0 fixes the problem (and such barrier is just
present as we was expecting the CPU to show such a reordering).

>By the way, I never liked the sequence that consists in changing the
>current->state and relying on some barrier to avoid race.

Why? That allows us to not cli().

>In my opinion, the only way to be obviously correct is to have the
>whole sequence that tests the condition and that move the task to its wait
>state actually protected.
>
>Something that the following make sense to me:
>
> LOCK(some_lock)
> if (CONDITION is FALSE) then SLEEP(some_event, &some_lock)

Your lock usually is cli() and infact lots of code is just doing that and
should be ported to the wait event interface in order to not harm irq
latency. I sure agree the above interface is simpler but the wait_event
should be equally SMP safe now with the lock on the bus and it provides
advantages.

>And have SLEEP that first puts the TASK into some sleeping state on
>'some_event' and _then_ unlock 'some_lock'.

Yes it make tons of sense and that's exactly why schedule() does an
implicit sti() for unlocking the 'some lock'. It's for the code that you
described above ;).

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Artur Skawina: "Re: gcc and builtin optimizations"
Previous message: Richard B. Johnson: "Re: printk ?"