Re: [PATCH v3 3/3] nfc: trf7970a: Prevent repeated polling from crashing the kernel

From: Geoff Lansberry
Date: Sat Dec 24 2016 - 11:18:08 EST


Mark - I'm sorry, but I did not write this code, and therefore was not
able to accurately describe it. It is fixing a different issue, not
the neard segfault that we are still chasing. Last week Jaret Cantu
sent a separate email explaining the purpose of the code, which had
you copied, did you see that? Does it explain why it was done to
your satisfaction? I've asked him to join in on the effort to push
the change upstream, however he will not be available until the new
year.

I know you did suggest that we split off that change from the others,
and if now is the time to do that, let me know. If you don't have
the email from Jaret, also please let me know and I will forward it to
you.

Geoff
Geoff Lansberry


Engineering Guy
KuvÃe, Inc
125 Kingston St., 3rd Floor
Boston, MA 02111
1-617-290-1118 (m)
geoff.lansberry (skype)
http://www.kuvee.com



On Sat, Dec 24, 2016 at 1:01 AM, Mark Greer <mgreer@xxxxxxxxxxxxxxx> wrote:
> On Wed, Dec 21, 2016 at 11:18:34PM -0500, Geoff Lansberry wrote:
>> From: Jaret Cantu <jaret.cantu@xxxxxxxxxxx>
>>
>> Repeated polling attempts cause a NULL dereference error to occur.
>> This is because the state of the trf7970a is currently reading but
>> another request has been made to send a command before it has finished.
>>
>> The solution is to properly kill the waiting reading (workqueue)
>> before failing on the send.
>>
>> Signed-off-by: Geoff Lansberry <geoff@xxxxxxxxx>
>> ---
>
> You've still provided virtually no information on the actual problem(s)
> nor justified why you think this is the best solution. You're adding
> code to a section of code that should _never_ be executed so the only
> reasonable things I can infer is that there are, at least, two problems:
>
> 1) There is a bug causing execution to get into this block of code.
>
> 2) Once in this block of code, there is another bug.
>
> You seem to be attempting to fix 2) and completely ignoring 1).
> 1) is the first bug that needs to be root-caused and fixed.
>
> Also, what exactly is the "NULL dereference error" you mention?
> Is this the neard crash you talked about in another thread or is
> this a kernel crash? If it is the kernel crash, please post the
> relevant information. If this is the neard crash - which seems
> unlikely - then how can changing a section of kernel code that
> shouldn't be executed in the first place fix that?
>
> Mark
> --