Re: frequent lockups in 3.18rc4

From: Thomas Gleixner
Date: Wed Dec 03 2014 - 16:49:04 EST


On Wed, 3 Dec 2014, Dave Jones wrote:
> On Wed, Dec 03, 2014 at 09:59:20PM +0100, Thomas Gleixner wrote:
>
> > Can you please provide the cpuinfo flags of that box?
>
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
> syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good
> nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64
> monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1
> sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
> lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept
> vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm
> xsaveopt

So that has nonstop_tsc and constant_tsc, which means that we switch
to sched_clock_stable, i.e. no range checks, nothing. We just take the
raw value and use it.

The clocksource code is a bit more paranoid and lets the TSC be
monitored by the watchdog. Now, if the TSC is detected as unstable we
should switch back to sched_clock_unstable, but we don't have a
mechanism for that.

That was obviously not considered when the sched_clock_stable stuff
was introduced. So sched_clock() happily uses TSC as a reliable thing
even when the clocksource code detected that it is crap.

For sure we need something here, but that sched_clock_stable
mechanism got introduced in 3.14, so it does not make any sense that
you observe that only post 3.16.

Thanks,

tglx








--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/