Re: [PATCH v2] sched/fair: Care divide error in update_task_scan_period()

From: Yasuaki Ishimatsu
Date: Wed Oct 22 2014 - 01:39:50 EST


(2014/10/21 18:21), Peter Zijlstra wrote:
On Thu, Oct 16, 2014 at 06:48:15PM +0900, Yasuaki Ishimatsu wrote:
+++ b/kernel/sched/fair.c
@@ -1466,6 +1466,7 @@ static void update_task_scan_period(struct task_struct *p,

unsigned long remote = p->numa_faults_locality[0];
unsigned long local = p->numa_faults_locality[1];
+ unsigned long total_faults = shared + private;

/*
* If there were no record hinting faults then either the task is
@@ -1496,6 +1497,14 @@ static void update_task_scan_period(struct task_struct *p,
slot = 1;
diff = slot * period_slot;
} else {
+ /*
+ * This is a rare case. total_faults might become 0 after
+ * offlining node. In this case, total_faults is set to 1
+ * for avoiding divide error.
+ */
+ if (unlikely(total_faults == 0))
+ total_faults = 1;
+
diff = -(NUMA_PERIOD_THRESHOLD - ratio) * period_slot;

/*
@@ -1506,7 +1515,7 @@ static void update_task_scan_period(struct task_struct *p,
* scanning faster if shared accesses dominate as it may
* simply bounce migrations uselessly
*/
- ratio = DIV_ROUND_UP(private * NUMA_PERIOD_SLOTS, (private + shared));
+ ratio = DIV_ROUND_UP(private * NUMA_PERIOD_SLOTS, (total_faults));
diff = (diff * ratio) / NUMA_PERIOD_SLOTS;


So what was wrong with the 'normal' unconditional +1 approach? Also
you've got superfluous parenthese.


When (private + shared) was not 0, I did not want to change behavior of
update_task_scan_period(). But I understood your comment. I'll update it.

Thanks,
Yasuaki Ishimatsu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/