Re: node-hotplug: is memset 0 safe in try_offline_node()?

From: Xishi Qiu
Date: Wed Mar 04 2015 - 02:02:17 EST


On 2015/3/4 11:53, Gu Zheng wrote:

> Hi Xishi,
>
> On 03/04/2015 10:22 AM, Xishi Qiu wrote:
>
>> On 2015/3/3 18:20, Gu Zheng wrote:
>>
>>> Hi Xishi,
>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
>>>
>>>> When hot-remove a numa node, we will clear pgdat,
>>>> but is memset 0 safe in try_offline_node()?
>>>
>>> It is not safe here. In fact, this is a temporary solution here.
>>> As you know, pgdat is accessed lock-less now, so protection
>>> mechanism (RCUï) is needed to make it completely safe here,
>>> but it seems a bit over-kill.
>>>
>>>>
>>>> process A: offline node XX:
>>>> for_each_populated_zone()
>>>> find online node XX
>>>> cond_resched()
>>>> offline cpu and memory, then try_offline_node()
>>>> node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
>>>> access node XX's pgdat
>>>> NULL pointer access error
>>>
>>> It's possible, but I did not meet this condition, did you?
>>>
>>
>> Yes, we test hot-add/hot-remove node with stress, and meet the following
>> call trace several times.
>
> Thanks.
>
>>
>> next_online_pgdat()
>> int nid = next_online_node(pgdat->node_id); // it's here, pgdat is NULL
>
> memset(pgdat, 0, sizeof(*pgdat));
> This memset just sets the context of pgdat to 0, but it will not free pgdat, so the *pgdat is
> NULL* is strange here.
> But anyway, the bug is real, we must fix it.

next_zone()
pg_data_t *pgdat = zone->zone_pgdat; // I think this pgdat is NULL, and NODE_DATA() is not NULL.
...
pgdat = next_online_pgdat(pgdat);
int nid = next_online_node(pgdat->node_id); // so here is the null pointer access

Thanks for your new patch, I'll test it.

Thanks,
Xishi Qiu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/