Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup

From: Thomas Gleixner
Date: Tue Jan 19 2016 - 04:51:36 EST


On Mon, 18 Jan 2016, Jeff Merkey wrote:
> What is strange is the math its doing. It is subtracting a fixed
> value from rax then comparing the value. It looks like this is case
> where the value may have wrapped and the code just wasn;t setup to
> handle it.

Well, the worst case what would happen is that it loops another full round.
If you subtract 1e9 from rax often enough then it will become smaller than 1e9
no matter whether it wrapped or not. It just takes more iterations.

> 0xffffffff810ede1d 482D00CA9A3B sub rax,0x3b9aca00
> 0xffffffff810ede23 83C201 add edx,0x1
> 0xffffffff810ede26 483DFFC99A3B cmp rax,0x3b9ac9ff
> 0xffffffff810ede2c 77EF ja ktime_get_ts64+0x9d
> (0xffffffff810ede1d) (up)
>
> The C code is:
>
>
> /**
> * ktime_get_ts64 - get the monotonic clock in timespec64 format
> * @ts: pointer to timespec variable
> *
> * The function calculates the monotonic clock from the realtime
> * clock and the wall_to_monotonic offset and stores the result
> * in normalized timespec64 format in the variable pointed to by @ts.
> */
> void ktime_get_ts64(struct timespec64 *ts)
> {
> struct timekeeper *tk = &tk_core.timekeeper;
> struct timespec64 tomono;
> s64 nsec;
> unsigned int seq;
>
> WARN_ON(timekeeping_suspended);
>
> do {
> seq = read_seqcount_begin(&tk_core.seq);
> ts->tv_sec = tk->xtime_sec;
> nsec = timekeeping_get_ns(&tk->tkr_mono);
> tomono = tk->wall_to_monotonic;
> <<<
> } while (read_seqcount_retry(&tk_core.seq, seq));
> <<<
> ts->tv_sec += tomono.tv_sec;
> ts->tv_nsec = 0;
> timespec64_add_ns(ts, nsec + tomono.tv_nsec);
> }
> EXPORT_SYMBOL_GPL(ktime_get_ts64);
>
> Any ideas how to fix this problem? That do {} while gets stuck there.

So now you are pointing to that do { } while. That has absolutely nothing to
do with timespec64_add_ns() to which you are referring above.

That do {} while loop gets stuck when the time keeper sequence counter has
changed while we were reading the time and the offset.

So where exactly is it stuck? Please provide backtraces from the
hardlockup detector.

Thanks,

tglx