[PATCH] irq/timings: Fix model validity

From: Peter Zijlstra
Date: Wed Nov 07 2018 - 04:46:34 EST


On Wed, Nov 07, 2018 at 09:59:36AM +0100, Peter Zijlstra wrote:
> On Wed, Nov 07, 2018 at 12:39:31AM +0100, Rafael J. Wysocki wrote:

> > In general, however, I need to be convinced that interrupts that
> > didn't wake up the CPU from idle are relevant for next wakeup
> > prediction. I see that this may be the case, but to what extent is
> > rather unclear to me and it looks like calling
> > irq_timings_next_event() would add considerable overhead.
>
> How about we add a (debug) knob so that people can play with it for now?
> If it turns out to be useful, we'll learn.

That said; Daniel, I think there is a problem with how irqs_update()
sets irqs->valid. We seem to set valid even when we're still training.

---
Subject: irq/timings: Fix model validity

The per IRQ timing predictor will produce a 'valid' prediction even if
the model is still training. This should not happen.

Fix this by moving the actual training (online stddev algorithm) up a
bit and returning early (before predicting) when we've not yet reached
the sample threshold.

A direct concequence is that the predictor will only ever run with at
least that many samples, which means we can remove one branch.

Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
---
kernel/irq/timings.c | 66 +++++++++++++++++++++++++++++-----------------------
1 file changed, 37 insertions(+), 29 deletions(-)

diff --git a/kernel/irq/timings.c b/kernel/irq/timings.c
index 1e4cb63a5c82..5d22fd5facd5 100644
--- a/kernel/irq/timings.c
+++ b/kernel/irq/timings.c
@@ -28,6 +28,13 @@ struct irqt_stat {
int valid;
};

+/*
+ * The rule of thumb in statistics for the normal distribution
+ * is having at least 30 samples in order to have the model to
+ * apply.
+ */
+#define SAMPLE_THRESHOLD 30
+
static DEFINE_IDR(irqt_stats);

void irq_timings_enable(void)
@@ -101,7 +108,6 @@ void irq_timings_disable(void)
* distribution appears when the number of samples is 30 (it is the
* rule of thumb in statistics, cf. "30 samples" on Internet). When
* there are three consecutive anomalies, the statistics are resetted.
- *
*/
static void irqs_update(struct irqt_stat *irqs, u64 ts)
{
@@ -146,11 +152,38 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts)
*/
diff = interval - irqs->avg;

+ /*
+ * Online average algorithm:
+ *
+ * new_average = average + ((value - average) / count)
+ *
+ * The variance computation depends on the new average
+ * to be computed here first.
+ *
+ */
+ irqs->avg = irqs->avg + (diff >> IRQ_TIMINGS_SHIFT);
+
+ /*
+ * Online variance algorithm:
+ *
+ * new_variance = variance + (value - average) x (value - new_average)
+ *
+ * Warning: irqs->avg is updated with the line above, hence
+ * 'interval - irqs->avg' is no longer equal to 'diff'
+ */
+ irqs->variance = irqs->variance + (diff * (interval - irqs->avg));
+
/*
* Increment the number of samples.
*/
irqs->nr_samples++;

+ /*
+ * If we're still training the model, we can't make any predictions yet.
+ */
+ if (irqs->nr_samples < SAMPLE_THRESHOLD)
+ return;
+
/*
* Online variance divided by the number of elements if there
* is more than one sample. Normally the formula is division
@@ -158,16 +191,12 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts)
* more than 32 and dividing by 32 instead of 31 is enough
* precise.
*/
- if (likely(irqs->nr_samples > 1))
- variance = irqs->variance >> IRQ_TIMINGS_SHIFT;
+ variance = irqs->variance >> IRQ_TIMINGS_SHIFT;

/*
- * The rule of thumb in statistics for the normal distribution
- * is having at least 30 samples in order to have the model to
- * apply. Values outside the interval are considered as an
- * anomaly.
+ * Values outside the interval are considered as an anomaly.
*/
- if ((irqs->nr_samples >= 30) && ((diff * diff) > (9 * variance))) {
+ if ((diff * diff) > (9 * variance)) {
/*
* After three consecutive anomalies, we reset the
* stats as it is no longer stable enough.
@@ -191,27 +220,6 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts)
*/
irqs->valid = 1;

- /*
- * Online average algorithm:
- *
- * new_average = average + ((value - average) / count)
- *
- * The variance computation depends on the new average
- * to be computed here first.
- *
- */
- irqs->avg = irqs->avg + (diff >> IRQ_TIMINGS_SHIFT);
-
- /*
- * Online variance algorithm:
- *
- * new_variance = variance + (value - average) x (value - new_average)
- *
- * Warning: irqs->avg is updated with the line above, hence
- * 'interval - irqs->avg' is no longer equal to 'diff'
- */
- irqs->variance = irqs->variance + (diff * (interval - irqs->avg));
-
/*
* Update the next event
*/