Re: [PATCH] devcoredump: increase the device delete timeout to 10 mins

From: Abhinav Kumar
Date: Sat Feb 12 2022 - 02:52:52 EST


Hi Greg

On 2/11/2022 11:04 PM, Greg KH wrote:
On Fri, Feb 11, 2022 at 10:59:39AM -0800, Abhinav Kumar wrote:
Hi Greg

Thanks for the response.

On 2/11/2022 3:09 AM, Greg KH wrote:
On Tue, Feb 08, 2022 at 11:44:32AM -0800, Abhinav Kumar wrote:
There are cases where depending on the size of the devcoredump and the speed
at which the usermode reads the dump, it can take longer than the current 5 mins
timeout.

This can lead to incomplete dumps as the device is deleted once the timeout expires.

One example is below where it took 6 mins for the devcoredump to be completely read.

04:22:24.668 23916 23994 I HWDeviceDRM::DumpDebugData: Opening /sys/class/devcoredump/devcd6/data
04:28:35.377 23916 23994 W HWDeviceDRM::DumpDebugData: Freeing devcoredump node

What makes this so slow? Reading from the kernel shouldn't be the
limit, is it where the data is being sent to?

We are still checking this. We are seeing better read times when we bump up
the thread priority of the thread which was reading this.

Where is the thread sending the data to?

The thread is writing the data to a file in local storage. From our profiling, the read is the one taking the time not the write.


We are also trying to check if bumping up CPU speed is helping.
But, results have not been consistently good enough. So we thought we should
also increase the timeout to be safe.

Why would 10 minutes be better than 30? What should the limit be? :)

Again, this is from our profiling. We are seeing a worst case time of 7 mins to finish the read for our data. Thats where the 10mins came from. Just doubling what we have currently. I am not sure how the current 5 mins timeout came from.


thanks,

greg k-h