Re: [RFC v3] debug: prevent entering debug mode on errors

From: Kiran Raparthy
Date: Mon Dec 01 2014 - 01:03:26 EST


Hi Jason,

On 27 November 2014 at 15:19, Daniel Thompson
<daniel.thompson@xxxxxxxxxx> wrote:
> On 26/11/14 17:45, Colin Cross wrote:
>> On Wed, Nov 26, 2014 at 1:14 AM, Kiran Raparthy <kiran.kumar@xxxxxxxxxx> wrote:
>>> From: Colin Cross <ccross@xxxxxxxxxxx>
>>>
>>> debug: prevent entering debug mode on errors
>>>
>>> On non-developer devices kgdb prevents CONFIG_PANIC_TIMEOUT from rebooting the
>>> device after a panic.
>>>
>>> In case of panics and exceptions, to honor CONFIG_PANIC_TIMEOUT, prevent
>>> entering debug mode to avoid getting stuck waiting for the user to interact
>>> with debugger.
>>>
>>> Cc: Jason Wessel <jason.wessel@xxxxxxxxxxxxx>
>>> Cc: kgdb-bugreport@xxxxxxxxxxxxxxxxxxxxx
>>> Cc: linux-kernel@xxxxxxxxxxxxxxx
>>> Cc: Android Kernel Team <kernel-team@xxxxxxxxxxx>
>>> Cc: John Stultz <john.stultz@xxxxxxxxxx>
>>> Cc: Sumit Semwal <sumit.semwal@xxxxxxxxxx>
>>> Signed-off-by: Colin Cross <ccross@xxxxxxxxxxx>
>>> [Kiran: Added context to commit message.
>>> panic_timeout is used instead of break_on_panic and
>>> break_on_exception to honor CONFIG_PANIC_TIMEOUT]
>>> Signed-off-by: Kiran Raparthy <kiran.kumar@xxxxxxxxxx>
>>> ---
>>> kernel/debug/debug_core.c | 17 +++++++++++++++++
>>> 1 file changed, 17 insertions(+)
>>>
>>> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
>>> index 1adf62b..0012a1f 100644
>>> --- a/kernel/debug/debug_core.c
>>> +++ b/kernel/debug/debug_core.c
>>> @@ -689,6 +689,14 @@ kgdb_handle_exception(int evector, int signo, int ecode, struct pt_regs *regs)
>>>
>>> if (arch_kgdb_ops.enable_nmi)
>>> arch_kgdb_ops.enable_nmi(0);
>>> + /*
>>> + * Avoid entering the debugger if we were triggered due to an oops
>>> + * but panic_timeout indicates the system should automatically
>>> + * reboot on panic. We don't want to get stuck waiting for input
>>> + * on such systems, especially if its "just" an oops.
>>> + */
>>> + if (signo != SIGTRAP && panic_timeout)
>>> + return 1;
>>>
>>> memset(ks, 0, sizeof(struct kgdb_state));
>>> ks->cpu = raw_smp_processor_id();
>>> @@ -821,6 +829,15 @@ static int kgdb_panic_event(struct notifier_block *self,
>>> unsigned long val,
>>> void *data)
>>> {
>>> + /*
>>> + * Avoid entering the debugger if we were triggered due to a panic
>>> + * We don't want to get stuck waiting for input from user in such case.
>>> + * panic_timeout indicates the system should automatically
>>> + * reboot on panic.
>>> + */
>>> + if (panic_timeout)
>>> + return NOTIFY_DONE;
>>> +
>>> if (dbg_kdb_mode)
>>> kdb_printf("PANIC: %s\n", (char *)data);
>>> kgdb_breakpoint();
>>
>> The original patch was more useful as it allowed re-enabling break on
>> panic on specific devices where you were trying to debug a
>> reproducible issue. What about using a module_param similar to
>> kgdbreboot, but setting the default based on CONFIG_PANIC_TIMEOUT to
>> avoid extra configuration?
>
> This change was due to my review so perhaps I'd better answer this...
>
> panic_timeout is the value of the panic sysctl. In addition to the
> normal sysctl tooling (which I don't think is available on most android
> systems), its value can be set using panic=0 on the kernel command line
> or via /proc/sys/kernel/panic at runtime.
>
> CONFIG_PANIC_TIMEOUT merely sets the default value of the sysctl. I
> guess perhaps the patch description could be improved to make this clearer.
>
> Therefore, the only loss of function I expected versus the original is
> that it would be hard to get as far as a reproducible panic if the
> system also has a ton of reproducible oopses that we don't want to fix.
> Is such a use-case important?

Could you please let me know if this patch is good to move from RFC to PATCH?
Regards,
Kiran
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/