Re: per cpun+ spin locks coexistence?

From: Peter Teoh
Date: Wed Mar 19 2008 - 20:08:07 EST


On 3/18/08, Eric Dumazet <dada1@xxxxxxxxxxxxx> wrote:

>
> You are right Peter, that fs/file.c contains some leftover from previous
> implementation of defer queue,
> that was using a timer.
>
> So we can probably provide a patch that :
>
> - Use spin_lock() & spin_unlock() instead of spin_lock_bh() &
> spin_unlock_bh() in free_fdtable_work()
> since we dont anymore use a softirq (timer) to reschedule the workqueue.
>
> ( this timer was deleted by the following patch :
> http://readlist.com/lists/vger.kernel.org/linux-kernel/50/251040.html
>
>
> But, you cannot avoid use of spin_lock()/spin_unlock() because
> schedule_work() makes no garantee that the work will be done by this cpu.

Ah.....u have hit the nail....and combine with Johannes Weiner's
explanation, I have pieced together the full scenario:

First, the following is possible:

fddef = &get_cpu_var(fdtable_defer_list);
spin_lock(&fddef->lock);
fdt->next = fddef->next;
fddef->next = fdt;==============>executing at CPU A
/* vmallocs are handled from the workqueue context */
schedule_work(&fddef->wq);
spin_unlock(&fddef->lock);==============>executing at CPU B
put_cpu_var(fdtable_defer_list);

where the execution can switch CPU after the schedule_work() API, then
LOGICALLY u definitely need the spin_lock(), and the per_cpu data is
really not necessary.

But without the per_cpu structure, then the following "dedicated
chunk" can only execute on one processor, with the possibility of
switching to another processor after schedule_work():

So then we introduce the per_cpu structure - so that the "dedicated
chunk" can be executing on multiple processor ALL AT THE SAME TIME,
without interferring each other, as fddef are per-cpu (rightfully
owned only before schedule_work() is called, but after schedule_work()
is called, an arbitrary CPU will be executing this fddef).

spin_lock() is necessary because of the possibility of CPU switch
(schedule_work()).

and per_cpu is so that the same chunk of code can be executing at
multiple CPUs all at the same time.

Now the key issue rises up - as I have just asked before u answered my question:

http://mail.nl.linux.org/kernelnewbies/2008-03/msg00236.html

can schedule_work() sleep? (just like schedule(), whcih can sleep right?)
schedule_work() is guaranteed to execute the work queue at least once,
and so this thread may or may not sleep. correct? Or wrong?

Problem is when u sleep and never wake up, then the spin_lock become
permanently locked, and when later the same CPU (have to be the same
fddef CPU) is being reschedule to execute the get_cpu_var() again, it
will spin_lock() infinitely, resulting in 100% CPU utilization error.

To prevent these types of error, spin_lock are always not to be used
with to wrap around functions that can sleep, and can only containing
short routines between lock and unlock.

Is my analysis correct?

--
Regards,
Peter Teoh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/