Re: [PATCH] delayacct: align to 8 byte boundary on 64-bit systems

From: Jeff Mahoney
Date: Wed Feb 17 2010 - 16:47:54 EST


On 02/12/2010 09:14 PM, Balbir Singh wrote:
> * Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> [2010-02-12 10:19:57]:
>
>> On Fri, 12 Feb 2010 11:48:27 -0500
>> Jeff Mahoney <jeffm@xxxxxxxx> wrote:
>>
>>> prepare_reply sets up an skb for the response. If I understand it correctly,
>>> the payload contains:
>>>
>>> +--------------------------------+
>>> | genlmsghdr - 4 bytes |
>>> +--------------------------------+
>>> | NLA header - 4 bytes | /* Aggregate header */
>>> +-+------------------------------+
>>> | | NLA header - 4 bytes | /* PID header */
>>> | +------------------------------+
>>> | | pid/tgid - 4 bytes |
>>
>> So we put another four zero bytes in here and add four to the "PID header".
>>
>
> Do you know if these breaks existing applications?

AFAIK it doesn't. The initial problem was reported by a partner and
there have been no reports of breakage since the fix was shipped in a
maintenance update for SLE11 in October. I think Andrew's guess about
users just copying the getdelays.c code is probably pretty accurate.

>>> | +------------------------------+
>>> | | NLA header - 4 bytes | /* stats header */
>>> | + -----------------------------+ <- oops. aligned on 4 byte boundary
>>> | | struct taskstats - 328 bytes |
>>> +-+------------------------------+
>>>
>>> The start of the taskstats struct must be 8 byte aligned on IA64 (and other
>>> systems with 8 byte alignment rules for 64-bit types) or runtime alignment
>>> warnings will be issued.
>>>
>>> This patch pads the pid/tgid field out to sizeof(long), which forces
>>> the alignment of taskstats. The getdelays userspace code is ok with this
>>> since it assumes 32-bit pid/tgid and then honors that header's length field.
>>>
>
> Could you define OK above? Does the application work without
> recompiling? Have you checked for endianness issues?

Yes, it seemed to work as before. I just used getdelays.c to test it.

>>> An array is used to avoid exposing kernel memory contents to userspace in the
>>> response.
>>>
>>> Signed-off-by: Jeff Mahoney <jeffm@xxxxxxxx>
>>> ---
>>> kernel/taskstats.c | 8 +++++++-
>>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> --- a/kernel/taskstats.c
>>> +++ b/kernel/taskstats.c
>>> @@ -362,6 +362,12 @@ static struct taskstats *mk_reply(struct
>>> struct nlattr *na, *ret;
>>> int aggr;
>>>
>>> + /* If we don't pad, we end up with alignment on a 4 byte boundary.
>>> + * This causes lots of runtime warnings on systems requiring 8 byte
>>> + * alignment */
>>> + u32 pids[2] = { pid, 0 };
>
> Shouldn't this be endianness dependent?

It could if the user casts to a 64-bit type instead of a 32-bit type,
but that assumption was covered in the comment about the pids and tgids
being 32-bit.

>>> + int pid_size = ALIGN(sizeof(pid), sizeof(long));
>>> +
>>> aggr = (type == TASKSTATS_TYPE_PID)
>>> ? TASKSTATS_TYPE_AGGR_PID
>>> : TASKSTATS_TYPE_AGGR_TGID;
>>> @@ -369,7 +375,7 @@ static struct taskstats *mk_reply(struct
>>> na = nla_nest_start(skb, aggr);
>>> if (!na)
>>> goto err;
>>> - if (nla_put(skb, type, sizeof(pid), &pid) < 0)
>>> + if (nla_put(skb, type, pid_size, pids) < 0)
>>> goto err;
>>> ret = nla_reserve(skb, TASKSTATS_TYPE_STATS, sizeof(struct taskstats));
>>> if (!ret)
>>
>> So any code which assumes that the pid/tgid field is four bytes long
>> will break. Code which takes that length from the netlink message
>> header will work OK.
>>
>
> I think we assume that PID/TGID is 32 bits long, we use the length to
> extract the rest of the message. We cast NLA_DATA to int* in
> getdelays.c.
>
>> 32-bit architectures are unaltered.
>>
>> Seems safe enough. We'd be safer still if we didn't do this on 64-bit
>> architectures which don't need it. ie: x86_64. But if we do that we
>> add a risk that people will develop shoddy code which works on x86_64
>> and doesn't work on ia64.
>>
>
> May be, this deserves an ABI and version bump in taskstats. I'll test
> this patch soon.

That could be too. Though if an ABI bump is in the works, then it's
probably better to not fix it this way and have an empty attribute type
instead.

-Jeff

--
Jeff Mahoney
SuSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/