Re: Discussion: quick_pit_calibrate is slow

From: George Spelvin
Date: Wed Jun 10 2015 - 05:11:48 EST


Ingo Molnar wrote:
>* George Spelvin <linux@xxxxxxxxxxx> wrote:

> As a side note: so VMs often want to skip the whole calibration business,
> because they are running on a well-calibrated host.

> 1,000 msecs is also an eternity: consider for example the KVM + tools/kvm
> based "Clear Containers" feature from Arjan:
> ... which boots up a generic Linux kernel to generic Linux user-space in 32
> milliseconds, i.e. it boots in 0.03 seconds (!).

Agreed, if you're paravirtualized, you can just pass this stuff in from
the host. But there's plenty of hardware virtualization that boots
a generic Linux.

I pulled generous numbers out of my ass because I didn't want to over-reach
in the argument that it's taking too long. The shorter the boot
time, the stronger the point.

>> With a total of 0.84 us of read uncertaity (1/12 of quick_pit_calibrate
>> currently), we can get within 500 ppm within 1.75 us. Or do better
>> within 5 or 10.

> (msec you mean I suspect?)

Yes, typo; that should be 1.75 ms.

>> The loop I'd write would start the PIC (and the RTC, if we want to)
>> and then go round-robin reading all the time sources and associated
>> TSC values.

> I'd just start with the PIT to have as few balls in flight as possible.

Once I get the loop structured properly, additional timers really
aren't a problem. The biggest PITA is the PM_TMR and all its
brokenness (do I have a PIIX machine in the closet somewhere?),
but the quick_pit_calibrate patch I already posted to LKML shows
how to handle that. I set up a small circular buffer of captured
values, and when I'm (say) three captures past the "interesting"
one, go back and see if the reads look good.

> Could you please structure it the following way:
>
> - first a patch that fixes bogus comments about the current code. It has
> bitrotten and if we change it significantly we better have a well
> documented starting point that is easier to compare against.
>
> - then a patch that introduces your more accurate calibration method and
> uses it as the first method to calibrate. If it fails (and it should have a
> notion of failing) then it should fall back to the other two methods.
>
> - possibly add a boot option to skip your new calibration method -
> i.e. to make the kernel behave in the old way. This would be useful
> for tracking down any regressions in this.
>
> - then maybe add a patch for the RTC method, but as a .config driven opt-in
> initially.

Sonds good, but when do we get to the decruftification? I'd prefer to
prepare the final patch (if nothing else, so Linus will be reassured by
the diffstat), although I can see holding it back for a few releases.

> Please also add calibration tracing code (.config driven and default-off),
> so that the statistical properties of calibration can be debugged and
> validated without patching the kernel.

Definitely desired, but I have to be careful here. Obviously I can't
print during the timing loop, so it will take either a lot of memory,
or add significant computation to the loop.

I also don't want to flood the kernel log before syslog is
started.

Do you have any specific suggestions? Should I just capture everything
into a permanently-allocated buffer and export it via debugfs?

>> I realize this is a far bigger overhaul than Adrian proposed, but do other
>> people agree that some decruftification is warranted?

> Absolutely!

Thanks for the encouragement!

>> Any suggestions for a reasonable time/quality tradeoff? 500 ppm ASAP?
>> Best I can do in 10 ms? Wait until the PIT is 500 ppm and then use
>> the better result from a higher-resolution timer if available?

> So I'd suggest a minimum polling interval (at least 1 msecs?) plus a
> ppm target. Would 100ppm be too aggressive?

How about 122 ppm (1/8192) because I'm lazy? :-)

What I imagine is this:

- The code will loop until it reaches 122 ppm or 55 ms, whichever comes
first. (There's also a minimum, before which 122 ppm isn't checked.)
- Initially, failure to reach 122 ppm will print a message and fall back.
- In the final cleanup patch, I'll accept anything up to 500 ppm
and only fail (and disable TSC) if I can't reach that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/