Re: Dynamic configure max_cstate

From: Len Brown
Date: Fri Jul 31 2009 - 11:14:42 EST


> And in addition to this, we should also take into account (read: skip)
> any idle states which kill busmaster DMA completely
> (in case of busmaster DMA I/O activities, that is).

It isn't so simple.
This is system specific.

In the old days, a c3-type C-state would lock down the bus
in order to assure no DMA could pass by to memory
before the processor could wake up to snoop.

Then a few years ago the hardware would allow us to
enter C3-type C-states, but transparently "pop-up"
into C2 to retire the snoop activity without ever
waking the processor to C0. This was good b/c
it was more efficient than waking to C0, but bad
b/c the OS could not easily tell if it actually
got the C3 time it requested, or if it was actually
spending a bunch of time demoted to C2...

In the most recent hardware, the core's cache is flushed
in deep C-states so that the core need not be woken
at all to snoop DMA activity.

Indeed, Yanmin's Nehalem box advertises two C3-type C-states,
but in reality, Nehalem doesn't have _any_ C3-type C-states,
only C2-type. The BIOS advertises C3-type C-states
to not break the installed base, which uses the presence
of C3-type C-states to work around the broken LAPIC timer.

I think the issue at hand on the system at hand is waking
up the processor in response to an IO interrupt break events.
ie. Linux doees a good job with timer interrupts, but
isn't so smart about using IO interrupts for demoting
C-states. Arjan is looking at fixing this.

cheers,
Len Brown, Intel Opoen Source TEchnology Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/