Re: [PATCH 01/10] Add parse_integer() (replacement for simple_strto*())

From: Rasmus Villemoes
Date: Mon May 04 2015 - 12:44:55 EST


[I'm merging the subthreads below]

On Mon, May 04 2015, Alexey Dobriyan <adobriyan@xxxxxxxxx> wrote:

> On Mon, May 4, 2015 at 4:24 PM, Rasmus Villemoes
> <linux@xxxxxxxxxxxxxxxxxx> wrote:
>>
>> First: Could you tell me what tree I can commit this on top of, to see
>> what gcc makes of it.
>
> Any recent kernel should be OK, code it quite self contained.
> I've just applied first two patches on top 4.1-rc2.
> BTW the correct order is
>
> 1) [PATCH 01/10] Add parse_integer() (replacement for simple_strto*())
> *** 2) [PATCH CORRECT 03/10] parse_integer: convert sscanf()
> 3) [PATCH 03/10] parse_integer: convert sscanf()
> 4) [PATCH 04/10] sscanf: fix overflow
> ...
> 10) [PATCH 10/10] ext2, ext3, ext4: convert to parse_integer()/kstrto*()
>
> I've copied patch #2 twice, so it won't apply and resent it
> with subject from patch #3 to confuse everyone even more.

You certainly successfully confused me :-)

>>> +#define parse_integer(s, base, val) \
>>> +({ \
>>> + const char *_s = (s); \
>>> + unsigned int _base = (base); \
>>> + typeof(val) _val = (val); \
>>> + \
>>> + __builtin_choose_expr( \
>>> + __builtin_types_compatible_p(typeof(_val), signed char *), \
>>> + _parse_integer_sc(_s, _base, (void *)_val),
>>> \
>>
>> Why the (void*) cast? Isn't _val supposed to have precisely the type
>> expected by _parse_integer_sc at this point?
>>
>>> + __builtin_choose_expr( \
>>> + __builtin_types_compatible_p(typeof(_val), long *) && sizeof(long) == 4,\
>>> + _parse_integer_i(_s, _base, (void *)_val), \
>>> + __builtin_choose_expr( \
>>> + __builtin_types_compatible_p(typeof(_val), long *) && sizeof(long) == 8,\
>>> + _parse_integer_ll(_s, _base, (void *)_val), \
>>> + __builtin_choose_expr( \
>>> + __builtin_types_compatible_p(typeof(_val), unsigned long *) && sizeof(unsigned long) == 4,\
>>> + _parse_integer_u(_s, _base, (void *)_val), \
>>> + __builtin_choose_expr( \
>>> + __builtin_types_compatible_p(typeof(_val), unsigned long *) && sizeof(unsigned long) == 8,\
>>> + _parse_integer_ull(_s, _base, (void *)_val), \
>>
>> Ah, I see. In these cases, one probably has to do a cast to pass a
>> (long*) as either (int*) or (long long*) - but why not cast to the type
>> actually expected by _parse_integer_* instead of the launder-anything (void*)?
>
> First macro was written without casts at all naively thinking
> that gcc will only typecheck in chosen __builtin_choose_expr branch.
> But it doesn't do that, remove casts and observe million of warnings.
> So I shut it up with "void *". Branch is chosen base on __b_t_c_p
> expression and I don't think it is possible to sneak in incorrect pointer.

I see. I was worried about something like the cast laundering away a
const qualifier, but (const int*) is not _b_t_c with (int*). So ok for
now - I'll play around with this a little and see if one can get rid of
the casts anyway.
>>
>> Is there any reason to disallow "-0"?
>
> No! -0 is not accepted because code is copied from kstrtoll()
> which doesn't accept "-0". It is even in the testsuite:
>
> static void __init test_kstrtoll_fail(void)
> {
> ...
> /* negative zero isn't an integer in Linux */
> {"-0", 0},
> {"-0", 8},
> {"-0", 10},
> {"-0", 16},
>
> Frankly I don't even remember why it does that, and
> no one noticed until now. libc functions accept "-0".

I think it's odd to accept "+0" but not "-0", but that's probably just
because I'm a mathematician. Am I right that you just added these test
cases because of the existing behaviour of kstrtoll? I suppose that
behaviour is just a historical accident.

If "-0" is not going to be accepted, I think that deserves a comment
(with rationale) in the parsing code and not hidden away in the test
suite.

>>> unsigned long long memparse(const char *ptr, char **retptr)
>>> {
>>> - char *endptr; /* local pointer to end of parsed string */
>>> + unsigned long long val;
>>>
>>> - unsigned long long ret = simple_strtoull(ptr, &endptr, 0);
>>> -
>>> - switch (*endptr) {
>>> + ptr += parse_integer(ptr, 0, &val);
>>
>> This seems wrong. simple_strtoull used to "sanitize" the return value
>> from the (old) _parse_integer, so that endptr still points into the
>> given string. Unconditionally adding the result from parse_integer may
>> make ptr point far before the actual string, into who-knows-what.
>
> When converting I tried to preserve the amount of error checking done.
> simple_strtoull() either
> a) return 0 and not advance pointer, or
> b) return something and advance pointer.
>

Are we talking about the same simple_strtoull? I see

cp = _parse_integer_fixup_radix(cp, &base);
rv = _parse_integer(cp, base, &result);
/* FIXME */
cp += (rv & ~KSTRTOX_OVERFLOW);

so cp is definitely advanced even in case of overflow. And in the case
of "underflow" (no digits found), the old code does initialize *result
to 0, while parse_integer by design doesn't write anything.

> Current code just ignores error case, so do I.

There's a difference between ignoring an error (which the current code
does), and ignoring _the possibility_ of an error (which the new code
does).

There are lots of callers of memparse(), and I don't think any of them
are prepared to handle *endp ending up pointing before the passed-in
string (-EINVAL == -22, -ERANGE == -34). I can easily see how that could
lead to an infinite loop, maybe worse.


[on the use of anonymous variables to get right type]

>> I suggest using the more obvious approach of declaring a union on the
>> stack and pass the address of the appropriate member.
>
> Union will work.
>
> No that it matters, it's a filesystem option parsing code
> which is executed rarely near the top of sys_mount(),
> not exactly perfomance bottleneck.
>
> It's a shame to lose such clever hack :-(

I'm sure you can find somewhere else to apply the hack :-)

> Just grep for simple_strto* and see how much error checking people do.
> The aim of the patchset is to convert code, error checking is separate
> story.

I think that's the wrong approach. One reason for adding this new
interface is precisely to facilitate error checking. So if you do some
example conversions, please do them _properly_ to show how error
checking is done.

As you say, it's easy to find places where error checking isn't done by
just grepping for simple_strto*. I don't want to have to add
parse_integer to that list. Abusing a new interface from the beginning
isn't really a recipe for happiness.

Converting uses of simple_strto*() to parse_integer() doesn't add value
if semantics of calling code is exactly the same. And it adds negative
value if the conversions change the semantics in a surprising way, as in
the memparse case above :-(

Rasmus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/