Re: [PATCH] vsprintf: protect kernel from panic due to non-canonical pointer dereference

From: Haakon Bugge
Date: Wed Oct 19 2022 - 07:17:23 EST




> On 18 Oct 2022, at 22:49, Andy Shevchenko <andriy.shevchenko@xxxxxxxxxxxxxxx> wrote:
>
> On Tue, Oct 18, 2022 at 08:30:01PM +0000, Jane Chu wrote:
>> On 10/18/2022 1:07 PM, Andy Shevchenko wrote:
>>> On Tue, Oct 18, 2022 at 06:56:31PM +0000, Jane Chu wrote:
>>>> On 10/18/2022 5:45 AM, Petr Mladek wrote:
>>>>> On Mon 2022-10-17 19:31:53, Jane Chu wrote:
>>>>>> On 10/17/2022 12:25 PM, Andy Shevchenko wrote:
>>>>>>> On Mon, Oct 17, 2022 at 01:16:11PM -0600, Jane Chu wrote:
>>>>>>>> While debugging a separate issue, it was found that an invalid string
>>>>>>>> pointer could very well contain a non-canical address, such as
>>>>>>>> 0x7665645f63616465. In that case, this line of defense isn't enough
>>>>>>>> to protect the kernel from crashing due to general protection fault
>>>>>>>>
>>>>>>>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
>>>>>>>> return "(efault)";
>>>>>>>>
>>>>>>>> So instead, use kern_addr_valid() to validate the string pointer.
>>>>>>>
>>>>>>> How did you check that value of the (invalid string) pointer?
>>>>>>>
>>>>>>
>>>>>> In the bug scenario, the invalid string pointer was an out-of-bound
>>>>>> string pointer. While the OOB referencing is fixed,
>>>>>
>>>>> Could you please provide more details about the fixed OOB?
>>>>> What exact vsprintf()/printk() call was broken and eventually
>>>>> how it was fixed, please?
>>>>
>>>> For sensitive reason, I'd like to avoid mentioning the specific name of
>>>> the sysfs attribute in the bug, instead, just call it "devX_attrY[]",
>>>> and describe the precise nature of the issue.
>>>>
>>>> devX_attrY[] is a string array, declared and filled at compile time,
>>>> like
>>>> const char const devX_attrY[] = {
>>>> [ATTRY_A] = "Dev X AttributeY A",
>>>> [ATTRY_B] = "Dev X AttributeY B",
>>>> ...
>>>> [ATTRY_G] = "Dev X AttributeY G",
>>>> }
>>>> such that, when user "cat /sys/devices/systems/.../attry_1",
>>>> "Dev X AttributeY B" will show up in the terminal.
>>>> That's it, no more reference to the pointer devX_attrY[ATTRY_B] after that.
>>>>
>>>> The bug was that the index to the array was wrongfully produced,
>>>> leading up to OOB, e.g. devX_attrY[11]. The fix was to fix the
>>>> calculation and that is not an upstream fix.
>>>>
>>>>>
>>>>>> the lingering issue
>>>>>> is that the kernel ought to be able to protect itself, as the pointer
>>>>>> contains a non-canonical address.
>>>>>
>>>>> Was the pointer used only by the vsprintf()?
>>>>> Or was it accessed also by another code, please?
>>>>
>>>> The OOB pointer was used only by vsprintf() for the "cat" sysfs case.
>>>> No other code uses the OOB pointer, verified both by code examination
>>>> and test.
>>>
>>> So, then the vsprintf() is _the_ point to crash and why should we hide that?
>>> Because of the crash you found the culprit, right? The efault will hide very
>>> important details.
>>>
>>> So to me it sounds like I like this change less and less...
>>
>> What about the existing check
>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
>> return "(efault)";
>> ?
>
> Because it's _special_. We know that First page is equivalent to a NULL pointer
> and the last one is dedicated for so called error pointers. There are no more
> special exceptions to the addresses in the Linux kernel (I don't talk about
> alignment requirements by the certain architectures).
>
>> In an experiment just to print the raw OOB pointer values, I saw below
>> (the devX attrY stuff are substitutes of the real attributes, other
>> values and strings are verbatim copy from "dmesg"):
>>
>> [ 3002.772329] devX_attrY[26]: (ffffffff84d60ad3) Dev X AttributeY E
>> [ 3002.772346] devX_attrY[27]: (ffffffff84d60ae4) Dev X AttributeY F
>> [ 3002.772347] devX_attrY[28]: (ffffffff84d60aee) Dev X AttributeY G
>> [ 3002.772349] devX_attrY[29]: (0) (null)
>> [ 3002.772350] devX_attrY[30]: (0) (null)
>> [ 3002.772351] devX_attrY[31]: (0) (null)
>> [ 3002.772352] devX_attrY[32]: (7665645f63616465) (einval)
>> [ 3002.772354] devX_attrY[33]: (646e61685f656369) (einval)
>> [ 3002.772355] devX_attrY[34]: (6f635f65755f656c) (einval)
>> [ 3002.772355] devX_attrY[35]: (746e75) (einval)
>>
>> where starting from index 29 are all OOB pointers.
>>
>> As you can see, if the OOBs are NULL, "(null)" was printed due to the
>> existing checking, but when the OOBs are turned to non-canonical which
>> is detectable, the fact the pointer value deviates from
>> (ffffffff84d60aee + 4 * sizeof(void *))
>> evidently shown that the OOBs are detectable.
>>
>> The question then is why should the non-canonical OOBs be treated
>> differently from NULL and ERR_VALUE?
>
> Obviously, to see the crash. And let kernel _to crash_. Isn't it what we need
> to see a bug as early as possible?

If you follow that argument, why doesn't the kernel crash when the pointer is, e.g., a NULL pointer? According to you, shouldn't it crash a early as possible in that case also?


Thxs, Håkon