Re: per cpun+ spin locks coexistence?

From: Eric Dumazet
Date: Thu Mar 20 2008 - 05:34:50 EST


Peter Teoh a écrit :
On 3/18/08, Eric Dumazet <dada1@xxxxxxxxxxxxx> wrote:

You are right Peter, that fs/file.c contains some leftover from previous
implementation of defer queue,
that was using a timer.

So we can probably provide a patch that :

- Use spin_lock() & spin_unlock() instead of spin_lock_bh() &
spin_unlock_bh() in free_fdtable_work()
since we dont anymore use a softirq (timer) to reschedule the workqueue.

( this timer was deleted by the following patch :
http://readlist.com/lists/vger.kernel.org/linux-kernel/50/251040.html


But, you cannot avoid use of spin_lock()/spin_unlock() because
schedule_work() makes no garantee that the work will be done by this cpu.

Ah.....u have hit the nail....and combine with Johannes Weiner's
explanation, I have pieced together the full scenario:

First, the following is possible:

fddef = &get_cpu_var(fdtable_defer_list);
spin_lock(&fddef->lock);
fdt->next = fddef->next;
fddef->next = fdt;==============>executing at CPU A
/* vmallocs are handled from the workqueue context */
schedule_work(&fddef->wq);
spin_unlock(&fddef->lock);==============>executing at CPU B
put_cpu_var(fdtable_defer_list);

where the execution can switch CPU after the schedule_work() API, then
LOGICALLY u definitely need the spin_lock(), and the per_cpu data is
really not necessary.

But without the per_cpu structure, then the following "dedicated
chunk" can only execute on one processor, with the possibility of
switching to another processor after schedule_work():
Hum, you misunderstood the point.

schedule_work(); wont switch your current CPU, since you are inside a spin_lock
()/spin_unlock() pair, so preemption is not possible.



So then we introduce the per_cpu structure - so that the "dedicated
chunk" can be executing on multiple processor ALL AT THE SAME TIME,
without interferring each other, as fddef are per-cpu (rightfully
owned only before schedule_work() is called, but after schedule_work()
is called, an arbitrary CPU will be executing this fddef).

spin_lock() is necessary because of the possibility of CPU switch
(schedule_work()).

and per_cpu is so that the same chunk of code can be executing at
multiple CPUs all at the same time.

Now the key issue rises up - as I have just asked before u answered my question:

http://mail.nl.linux.org/kernelnewbies/2008-03/msg00236.html

can schedule_work() sleep? (just like schedule(), whcih can sleep right?)
schedule_work() is guaranteed to execute the work queue at least once,
and so this thread may or may not sleep. correct? Or wrong?

schedule_work() cannot sleep. It only queues a work to be done later by a special thread.

We need this because vfree() should not be called from softirq handler (rcu in this case), so we queue a (small) job.
Problem is when u sleep and never wake up, then the spin_lock become
permanently locked, and when later the same CPU (have to be the same
fddef CPU) is being reschedule to execute the get_cpu_var() again, it
will spin_lock() infinitely, resulting in 100% CPU utilization error.

To prevent these types of error, spin_lock are always not to be used
with to wrap around functions that can sleep, and can only containing
short routines between lock and unlock.

Is my analysis correct?

Not exactly :) , but please continue to learn :)





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/