Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup

From: Jeff Merkey
Date: Wed Jan 20 2016 - 12:33:48 EST


On 1/20/16, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> On Wed, 20 Jan 2016, Jeff Merkey wrote:
>> On 1/20/16, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>> > On Tue, 19 Jan 2016, Jeff Merkey wrote:
>> >> Nasty bug but trivial fix for this. What happens here is RAX (nsecs)
>> >> gets set to a huge value (RAX = 0x17AE7F57C671EA7D) and passed through
>> >
>> > And how exactly does that happen?
>> >
>> > 0x17AE7F57C671EA7D = 1.70644e+18 nsec
>> > = 1.70644e+09 sec
>> > = 2.84407e+07 min
>> > = 474011 hrs
>> > = 19750.5 days
>> > = 54.1109 years
>> >
>> > That's the real issue, not what you are trying to 'fix' in
>> > timespec_add_ns()
>> >
>> >> Submitting a patch to fix this after I regress and test it. Since it
>> >> makes no sense to loop on a simple calculation, fix should be:
>> >>
>> >> static __always_inline void timespec_add_ns(struct timespec *a, u64
>> >> ns)
>> >> {
>> >> a->tv_sec += div64_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
>> >> a->tv_nsec = ns;
>> >> }
>> >
>> > No. It's not that simple, because div64_u64_rem() is expensive on 32bit
>> > architectures which have no hardware 64/32 division. And that's going
>> > to
>> > hurt
>> > for the normal tick case where we have at max one iteration.
>> >
>>
>> It's less expensive than a hard coded loop that subtracts in a looping
>> function as a substitute for dividing which is what is there. What a
>> busted piece of shit .... LOL
>
> Let's talk about shit.
>
> timespec[64]_add_ns() is used for timekeeping and in all normal use cases
> the
> nsec part is less than 1e9 nsec. Even on 64 bit a divide is more expensive
> than the sinlge iteration while loop and its insane expensive on 32bit
> machines which do not have a 64/32 divison in hardware.
>
> The while loop is there for a few corner cases which are a bit larger than
> 1e9
> nsecs, but that's not the case we optimize for.
>
> The case you are creating with your debugger is something completely
> different
> and we never thought about it nor cared about it. Why? Because so far
> nobody
> complained and I never cared about kernel debuggers at all.
>
> What's worse is that your 'fix' does not resolve the underlying issue at
> all. Why? Simply because you tried to fix the symptom and not the root
> cause.
>
> I explained you the root cause and I explained you why that while() loop is
> more efficient than a divide for the case it was written and optimized for.
>
> Instead of reading and understanding what I wrote you teach me that your
> divide is more efficient and call it a busted piece of shit.
>
> Sure you are free to call that a busted piece of shit, but you don't have
> to
> expect that the people who wrote, maintain and understand that code are
> going
> to put up with your attitude.
>
> Thanks,
>
> tglx
>
>
>
>

Thomas, so far the only thing I've gotten from you are whining
diatribes about email ettiquette and today is the FIRST time you have
responded to me with any arguments that have technical merit and
actually address the problem. That's what I am here for, not
anything else. As Linus has said repeatedly, being nice doesn't seem
to get results. Being direct and honest does.

I am here to make MDB run as well as I can make it on Linux and to
help make it run as well as possible on c86_64 and ia32. That's my
only objective here. That being said, I am still wanting to get
this fixed.

Will you help me?

Jeff