Re: Urgent: BUG: PPP ioctl Transport endpoint is not connected

From: Martin Zaharinov
Date: Wed Dec 09 2020 - 14:13:38 EST




> On 9 Dec 2020, at 20:10, Guillaume Nault <gnault@xxxxxxxxxx> wrote:
>
> On Wed, Dec 09, 2020 at 06:57:44PM +0200, Martin Zaharinov wrote:
>>> On 9 Dec 2020, at 18:40, Guillaume Nault <gnault@xxxxxxxxxx> wrote:
>>> On Wed, Dec 09, 2020 at 04:47:52PM +0200, Martin Zaharinov wrote:
>>>> Hi All
>>>>
>>>> I have problem with latest kernel release
>>>> And the problem is base on this late problem :
>>>>
>>>>
>>>> https://www.mail-archive.com/search?l=netdev@xxxxxxxxxxxxxxx&q=subject:%22Re%5C%3A+ppp%5C%2Fpppoe%2C+still+panic+4.15.3+in+ppp_push%22&o=newest&f=1
>>>>
>>>> I have same problem in kernel 5.6 > now I use kernel 5.9.13 and have same problem.
>>>>
>>>>
>>>> In kernel 5.9.13 now don’t have any crashes in dimes but in one moment accel service stop with defunct and in log have many of this line :
>>>>
>>>>
>>>> error: vlan608: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>>>> error: vlan617: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>>>> error: vlan679: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>>>>
>>>> In one moment connected user bump double or triple and after that service defunct and need wait to drop all session to start .
>>>>
>>>> I talk with accel-ppp team and they said this is kernel related problem and to back to kernel 4.14 there is not this problem.
>>>>
>>>> Problem is come after kernel 4.15 > and not have solution to this moment.
>>>
>>> I'm sorry, I don't understand.
>>> Do you mean that v4.14 worked fine (no crash, no ioctl() error)?
>>> Did the problem start appearing in v4.15? Or did v4.15 work and the
>>> problem appeared in v4.16?
>>
>> In Telegram group I talk with Sergey and Dimka and told my the problem is come after changes from 4.14 to 4.15
>> Sergey write this : "as I know, there was a similar issue in kernel 4.15 so maybe it is still not fixed"
>
> Ok, but what is your experience? Do you have a kernel version where
> accel-ppp reports no ioctl() error and doesn't crash the kernel?
Reported by Sergey and Dimka version 4.14 < don’t have this problem,
Only version after 4.15.0 > have problem and with patch only fix to don’t put crash log in dimes and freeze system.


>
> There wasn't a lot of changes between 4.14 and 4.15 for PPP.
> The only PPP patch I can see that might have been risky is commit
> 0171c4183559 ("ppp: unlock all_ppp_mutex before registering device").

For my changes is a atomic and skb kfree .
>
>> I don’t have options to test with this old kernel 4.14.xxx i don’t have support for them.
>>
>>
>>>
>>>> Please help to find the problem.
>>>>
>>>> Last time in link I see is make changes in ppp_generic.c
>>>>
>>>> ppp_lock(ppp);
>>>> spin_lock_bh(&pch->downl);
>>>> if (!pch->chan) {
>>>> /* Don't connect unregistered channels */
>>>> spin_unlock_bh(&pch->downl);
>>>> ppp_unlock(ppp);
>>>> ret = -ENOTCONN;
>>>> goto outl;
>>>> }
>>>> spin_unlock_bh(&pch->downl);
>>>>
>>>>
>>>> But this fix only to don’t display error and freeze system
>>>> The problem is stay and is to big.
>>>
>>> Do you use accel-ppp's unit-cache option? Does the problem go away if
>>> you stop using it?
>>>
>>
>> No I don’t use unit-cache , if I set unit-cache accel-ppp defunct same but user Is connect and disconnet more fast.
>>
>> The problem is same with unit and without .
>> Only after this patch I don’t see error in dimes but this is not solution.
>
> Soryy, what's "in dimes"?
> Do you mean that reverting commit 77f840e3e5f0 ("ppp: prevent
> unregistered channels from connecting to PPP units") fixes your problem?
Sorry text correct in dmesg*

I don’t make any changes of ppp part of kernel 5.9.13 I use clean build for ppp .


>
>> In network have customer what have power cut problem, when drop 600 user and back Is normal but in this moment kernel is locking and start to make this :
>> sessions:
>> starting: 4235
>> active: 3882
>> finishing: 378
>> The problem is starting session is not real user normal user in this server is ~4k customers .
>
> What type of session is it? L2TP, PPPoE, PPTP?
>
Session is PPPoE only with radius auth

>> I use pppd_compat .
>>
>> Any idea ?
>>
>>>>
>>>> Please help to fix.
>> Martin