Re: [PATCH 2/3, V2] kernel: Move groups_sort to the caller of set_groups.

From: Thiago Rafael Becker
Date: Tue Dec 05 2017 - 18:03:59 EST




On Tue, 5 Dec 2017, Matthew Wilcox wrote:

On Tue, Dec 05, 2017 at 07:11:00AM +1100, NeilBrown wrote:
As we don't seem to be pursuing this possibility is probably isn't very
important, but I'd like to point out that the original fix isn't a true
fix.
It just sorts a shared group_info early. This does not stop corruption.
Every time a thread calls set_groups() on that group_info it will be
sorted again.
The sort algorithm used is the heap sort, and a heap sort always moves
elements in the array around - it does not leave a sorted array
untouched (unlike e.g. the quick sort which doesn't move anything in a
sorted array).
So it is still possible for two calls to groups_sort() to race.
We *need* to move groups_sort() out of set_groups().

Hum, makes sense. I've applied it to the most recent Fedora kernel (that uses heapsort) and I didn't see the problem again. I should run a few more repetitions to be sure.

It must be relatively common to sort an already-sorted array. I wonder
if something like this patch would be worthwhile?

I have deliberately broken this patch so it can't be applied. I haven't
tested it, and for all I know, I got the sign of cmp_func wrong.

diff --git a/lib/sort.c b/lib/sort.c
index d6b7a202b0b6..2b527fde6dad 100644
--- a/lib/sort.c
+++ b/lib/sort.c
@@ -75,7 +75,14 @@ void sort(void *base, size_t num, size_t size,
swap_func = generic_swap;
}

- /* heapify */
+ /* Do not sort an already-sorted array */
+ for (c = 0; c < (n - size); c += size) {
+ if (cmp_func(base + c, base + c + size) < 0)
+ goto heapify;
+ }
+ return;
+
+heapify:
for ( ; i >= 0; i -= size) {
for (r = i; r * 2 + size < n; r = c) {
c = r * 2 + size;

The bug happens when two threads enter sort_groups for the same group info in parallel, and one thread starts overwriting values that another thread may already have "heapified" or sorted.

Thread A Thread B
Enter groups_sort
Enter groups_sort
.
.
.
Return from groups_sort
.
.
.
Return from groups_sort

Wouldn't this patch just make both threads see the structure as unsorted and sort them?

Thanks,
trbecker