Re: [PATCH] tty: fix data race in n_tty_receive_buf_common

From: Kohli, Gaurav
Date: Thu Jan 04 2018 - 08:46:56 EST



Which tty driver ? serial/msm_serial.c ?

We are using our internal driver, msm_geni_serial.c
Ok no what I need to see is a trace of what each CPU is doing at the
point you detect the problem. That way we can see what the path that
races is.
Below is stack trace running by init in our case on one core
-006|n_tty_open(
ÂÂÂ |ÂÂÂ tty = 0xFFFFFFFF477AC880 -> (
ÂÂÂ |ÂÂÂÂÂ disc_data = 0xFFFFFF80197AD000,

ÂÂÂ |ÂÂÂÂÂ port = 0xFFFFFFFFEDE40000))
ÂÂÂ |Â ldata = 0xFFFFFF80197AD000

ÂÂÂ |Â trace_printk_fmt = 0xFFFFFF9F275125F8
-007|tty_ldisc_open.isra.3(
ÂÂÂ |ÂÂÂ tty = 0xFFFFFFFF477AC880)
-008|tty_ldisc_setup(

-009|tty_init_dev(
ÂÂÂ |ÂÂÂ driver = 0xFFFFFFFFEDE2A480,
ÂÂÂ |ÂÂÂ idx = 0)

-010|tty_open_by_driver(inline)
-010|tty_open(

Core 2:
-000|n_tty_receive_buf_common(
ÂÂÂ |ÂÂÂ tty = 0xFFFFFFFF477AC880,

ÂÂÂ |Â ?)
ÂÂÂ |Â ldata_=_0x0
ÂÂÂ |Â __func__ = (110, 95, 116, 116, 121, 95, 114, 101, 99, 101, 105, 118, 101, 95, 98, 117, 102, 95, 99, 111, 109, 109, 111, 110, 0)
ÂÂÂ |Â __u = (__val = 7079195495121566464, __c = (0))
ÂÂÂ |Â c = 127
ÂÂÂ |Â ldata = 0xFFFFFFFFF40DF97C

ÂÂÂ |Â c = 0
ÂÂÂ |Â ldata = 0xFFFFFF9F26F46000

-001|n_tty_receive_buf2(
ÂÂÂ |ÂÂÂ tty = 0xFFFFFFFF477AC880,

-002|tty_ldisc_receive_buf(inline)
-002|receive_buf(inline)
-002|flush_to_ldisc(

Please let me know in case some other trace required
We have seen this issue on 4.9 and also one thing i have observed,
before tty is getting reinit in tty_init_dev(),
When yo stop the DMA is it instantaneous or does it cause a final
interrupt after you return from stop_rx ?

To me it still looks like data is being queued after the port is told to
stop but that's not a certainty.
This geni is based on FIFO.
I have also put ftraces and from that we can see open is not able to finish but there is request of flushing:
ÂÂÂ ÂÂÂÂ kworker/-15514ÂÂ 2.... 35206.979226: workqueue_execute_start: work struct 0xffffffffede40008: function flush_to_ldisc

ÂÂÂ ÂÂÂ Âkworker/-15514ÂÂ 2.... 35206.979237: bprint: n_tty_receive_buf_common: <Debug>n_tty_receive_buf_common tty=0xffffffff477ac880 ldata=(nil)

ÂÂÂ ÂÂÂ ÂÂÂÂÂÂÂÂÂÂ init-1ÂÂÂÂÂÂ 4.... 35206.979751: bprint:ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ n_tty_open: <Debug>n_tty_open tty=0xffffffff477ac880 ldata=0xffffff80197ad000
there is console service exited before it in all the dumps.
Â35206.969644:ÂÂ <2> init: Service 'console' (pid 7440) exited with
status 130
Â35206.969690:ÂÂ <2> init: Sending signal 9 to service 'console' (pid
7440) process group...
Â35206.970857:ÂÂ <2> init: kill(7440, 9) failed: No such process.

So how can we stop request of receive buff, if we already have tty_port
and tty is getting reinitialized in midway like above
case?
Is the port your console device. If you use a different port as a console
device does the problem go away - that could be a very important detail
as the hangup behaviour for the two is quite different.

Yes , we are using ttymsm0 as console device, this is the only port we are using.

So it seems here, we are getting flush request when init reinitialize the tty for same port. Please let me know
if some other debug logs are required.

Regards
Gaurav

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.