Re: [PATCH 1/1] rcu/tree: add emergency pool for headless case

From: Joel Fernandes
Date: Mon Apr 06 2020 - 21:47:10 EST


On Mon, Apr 06, 2020 at 06:17:08PM +0200, Uladzislau Rezki wrote:
> On Mon, Apr 06, 2020 at 11:18:51AM -0400, Joel Fernandes wrote:
> > Hi Vlad,
> >
> > On Mon, Apr 06, 2020 at 02:56:40PM +0200, Uladzislau Rezki wrote:
> > > Hello, Joel.
> > >
> > > > > >
> > > > > > Hi Vlad,
> > > > > >
> > > > > > One concern I have is this moves the problem a bit further down. My belief is
> > > > > > we should avoid the likelihood of even needing an rcu_head allocated for the
> > > > > > headless case, to begin with - than trying to do damage-control when it does
> > > > > > happen. The only way we would end up needing an rcu_head is if we could not
> > > > > > allocate an array.
> > > > > >
> > > > > Let me share my view on all such caching. I think that now it becomes less as
> > > > > the issue, because of we have now https://lkml.org/lkml/2020/4/2/383 patch.
> > > > > I see that it does help a lot. I tried to simulate low memory condition and
> > > > > apply high memory pressure with that. I did not manage to trigger the
> > > > > "synchronize rcu" path at all. It is because of using much more permissive
> > > > > parameters when we request a memory from the SLAB(direct reclaim, etc...).
> > > >
> > > > That's a good sign that we don't hit this path in your tests.
> > > >
> > > Just one request, of course if you have a time :)
> > > Could you please double check on your test environment to stress the system
> > > to check if you also can not hit it?
> >
> > Sure, I am planning to do so and happy to spend time on it :) One question I
> > had about the below test:
> >
> > > How i test it. Please apply below patch:
> > > <snip>
> > > t a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 5e26145e9ead..25f7ac8583e1 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3203,6 +3203,7 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > >
> > > if (head) {
> > > ptr = (void *) head - (unsigned long) func;
> > > + head = NULL;
> > > } else {
> > > /*
> > > * Please note there is a limitation for the head-less
> > > @@ -3233,16 +3234,18 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > * Under high memory pressure GFP_NOWAIT can fail,
> > > * in that case the emergency path is maintained.
> > > */
> > > - success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr);
> > > - if (!success) {
> > > + /* success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr); */
> >
> > If you drop this, then it is not realistic right? I mean it changes behavior
> > of the code completely. We need to try to allocate array and then try to
> > allocate the head.
> >
> That just bypasses an allocation for the array, to make it more simple
> and move forward toward the path we would like to test. Also head is
> set to NULL to simulated headless freeing.

Makes sense, I know you are forcing code to invoke the bad case more often
but I was concerned the hack would change dynamics of code enough to make it
an unrealistic situation. But I see your point.

> > > + /* if (!success) { */
> > > /* Is headless object? */
> > > if (head == NULL) {
> > > /* Drop the lock. */
> > > krc_this_cpu_unlock(krcp, flags);
> > >
> > > head = attach_rcu_head_to_object(ptr);
> > > - if (head == NULL)
> > > + if (head == NULL) {
> > > + success = false;
> > > goto inline_return;
> > > + }
> > >
> > > /* Take it back. */
> > > krcp = krc_this_cpu_lock(&flags);
> > > @@ -3267,7 +3270,7 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > */
> > > expedited_drain = true;
> > > success = true;
> > > - }
> > > + /* } */
> > >
> > > WRITE_ONCE(krcp->count, krcp->count + 1);
> > >
> > > @@ -3297,7 +3300,9 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > if (!rcu_kfree_nowarn)
> > > WARN_ON_ONCE(1);
> > > debug_rcu_head_unqueue(ptr);
> > > - synchronize_rcu();
> > > + /* synchronize_rcu(); */
> > > + printk(KERN_ERR "-> hit synchronize_rcu() path.\n");
> > > + trace_printk("-> hit synchronize_rcu() path.\n");
> > > kvfree(ptr);
> > > }
> > > }
> > > <snip>
> > >
> > > lower the memory size and run kfree rcu tests. It would be appreciated.
> >
> > I am happy to try out the diff if I can understand how the above diff is
> > close enough with current code's behavior, if we are not using the array. One
> > other issue with current kfree rcu tests is, the test is itself the reason
> > for the pressure -- I believe we should also have some testing that shows
> > that the memory pressure is caused else where (such as a real user workload
> > causing OOM), and then we see how RCU behaves under OOM -- if we have too
> > many synchronous latencies, does the additional caching remove such latenies
> > under OOM? etc. I also want to look into your vmalloc tests.
> >
> Of course to have real tests would be good.

Agreed.

> > > > I guess also, with your latest patch on releasing the lock to be in a
> > > > non-atomic context, and then doing the allocation, it became even more
> > > > permissive? If you drop that patch and tried, do you still not hit the
> > > > synchronous path more often?
> > > >
> > > Yep. If i drop the patch, i can hit it.
> >
> > Ah, cool. So basically the direct-reclaim path does the synchronous waiting,
> > instead of synchronize_rcu(). Either way, we wait synchronously. How to chose
> > which way is better though? If direct reclaim improves the memory situation,
> > then we should enter that path. But if direct reclaim takes too much time
> > (thus hurting the kfree_rcu() latency), then perhaps it is better for
> > kfree_rcu() to just do the synchronize_rcu() and let someone else enter the
> > direct-reclaim path. We should probably quantify and see which approach works
> > better.
> >
> I see at it like, headless variant has to be called from the sleeping context,
> therefore it can sleep. What is better to call synchronize_rcu() or doing direct
> reclaim depends on how many CPUs in a system we have. I suspect that doing
> direct reclaim is better, at least it will free some memory for us. We
> could also extend that patch and make it a bit different, for example do
> NOWAIT then try ATOMIC and as a last step do GFP_KERNEL alloc.

Yes, that's a good idea. That way perhaps we reduce chance that kfree_rcu()
enters into direct-reclaim. Let us do it that way. At least I don't see any
drawbacks in such approach.

thanks,

- Joel