Re: [PATCH] refscale: Fix use of uninitalized wait_queue_head_t

From: Paul E. McKenney
Date: Fri Jul 07 2023 - 12:16:10 EST


On Fri, Jul 07, 2023 at 10:56:51AM -0400, Waiman Long wrote:
> On 7/7/23 10:07, Davidlohr Bueso wrote:
> > On Thu, 06 Jul 2023, Waiman Long wrote:
> >
> > > It was found that running the refscale test might sometimes crash the
> > > kernel with the following error:
> > >
> > > [ 8569.952896] BUG: unable to handle page fault for address:
> > > ffffffffffffffe8
> > > [ 8569.952900] #PF: supervisor read access in kernel mode
> > > [ 8569.952902] #PF: error_code(0x0000) - not-present page
> > > [ 8569.952904] PGD c4b048067 P4D c4b049067 PUD c4b04b067 PMD 0
> > > [ 8569.952910] Oops: 0000 [#1] PREEMPT_RT SMP NOPTI
> > > [ 8569.952916] Hardware name: Dell Inc. PowerEdge R750/0WMWCR, BIOS
> > > 1.2.4 05/28/2021
> > > [ 8569.952917] RIP: 0010:prepare_to_wait_event+0x101/0x190
> > >  :
> > > [ 8569.952940] Call Trace:
> > > [ 8569.952941]  <TASK>
> > > [ 8569.952944]  ref_scale_reader+0x380/0x4a0 [refscale]
> > > [ 8569.952959]  kthread+0x10e/0x130
> > > [ 8569.952966]  ret_from_fork+0x1f/0x30
> > > [ 8569.952973]  </TASK>
> > >
> > > This is likely caused by the fact that init_waitqueue_head() is called
> > > after the ref_scale_reader kthread is created. So the kthread may try
> > > to use the waitqueue head before it is properly initialized. Fix this
> > > by initializing the waitqueue head first before kthread creation.
> > >
> > > Fixes: 653ed64b01dc ("refperf: Add a test to measure performance of
> > > read-side synchronization")
> > > Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
> >
> > Strange this wasn't reported sooner.
>
> Red Hat does have a pretty large QE organization that run all sort of tests
> include this one pretty frequently. The race window is pretty small, but
> they did hit this once in a while.

I do run this fairly frequently, but haven't managed to hit it.

Good show on making it happen, and looking forward to the updated patch!

Thanx, Paul