Re: sched_yield() makes OpenLDAP slow

From: Howard Chu
Date: Mon Aug 22 2005 - 18:21:12 EST


Florian Weimer wrote:
* Howard Chu:
> That's not the complete story. BerkeleyDB provides a
> db_env_set_func_yield() hook to tell it what yield function it
> should use when its internal locking routines need such a function.
> If you don't set a specific hook, it just uses sleep(). The
> OpenLDAP backend will invoke this hook during some (not necessarily
> all) init sequences, to tell it to use the thread yield function
> that we selected in autoconf.

And this helps to increase performance substantially?

When the caller is a threaded program, yes, there is a substantial (measurable and noticable) difference. Given that sleep() blocks the entire process, the difference is obvious.

> Note that (on systems that support inter-process mutexes) a
> BerkeleyDB database environment may be used by multiple processes
> concurrently.

Yes, I know this, and I haven't experienced that much trouble with
deadlocks. Maybe the way you structure and access the database
environment can be optimized for deadlock avoidance?

Maybe we already did this deadlock analysis and optimization, years ago when we first started developing this backend? Do you think everyone else in the world is a total fool?

> As such, the yield function that is provided must work both for
> threads within a single process (PTHREAD_SCOPE_PROCESS) as well as
> between processes (PTHREAD_SCOPE_SYSTEM).

If I understand you correctly, what you really need is a syscall
along the lines "don't run me again until all threads T that share
property X have run, where the Ts aren't necessarily in the same
process". The kernel is psychic, it can't really know which
processes to schedule to satisfy such a requirement. I don't even
think "has joined the Berkeley DB environment" is the desired
property, but something like "is part of this cycle in the wait-for
graph" or something similar.

You seem to believe we're looking for special treatment for the processes we're concerned with, and that's not true. If the system is busy with other processes, so be it, the system is busy. If you want better performance, you build a dedicated server and don't let anything else make the system busy. This is the way mission-critical services are delivered, regardless of the service. If you're not running on a dedicated system, then your deployment must not be mission critical, and so you shouldn't be surprised if a large gcc run slows down some other activities in the meantime. If you have a large nice'd job running before your normal priority jobs get their timeslice, then you should certainly wonder wtf the scheduler is doing, and why your system even claims to support nice() when clearly it doesn't mean anything on that system.

I would have to check the Berkeley DB internals in order to tell what
is feasible to implement. This code shouldn't be on the fast path,
so some kernel-based synchronization is probably sufficient.

pthread_cond_wait() probably would be just fine here, but BerkeleyDB doesn't work that way.

--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc
OpenLDAP Core Team http://www.openldap.org/project/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/