Re: [PATCH v7] checkkconfigsymbols.sh: reimplementation in python

From: Paul Bolle
Date: Mon Sep 29 2014 - 06:28:52 EST


[Added linux-kbuild@xxxxxxxxxxxxxxxx]

On Sun, 2014-09-28 at 17:55 +0200, Valentin Rothberg wrote:
> The scripts/checkkconfigsymbols.sh script searches Kconfig features
> in the source code that are not defined in Kconfig. Such identifiers
> always evaluate to false and are the source of various kinds of bugs.
> However, the shell script is slow and it does not detect such broken
> references in Kbuild and Kconfig files (e.g., ``depends on UNDEFINEDÂÂ).
> Furthermore, it generates false positives. The script is also hard to
> read and understand, and is thereby difficult to maintain.
>
> This patch replaces the shell script with an implementation in Python,
> which:
> (a) detects the same bugs, but does not report previous false positives
> (b) additionally detects broken references in Kconfig and all
> non-Kconfig files, such as Kbuild, .[cSh], .txt, .sh, defconfig, etc.
> (c) is up to 75 times faster than the shell script
> (d) only checks files under version control ('git ls-files')

(The shell script is .git unaware. If you happen to have one or more
branches with "Kconfig" in their name, as I do, it generates a lot of
noise on stderr.)

> The new script reduces the runtime on my machine (i7-2620M, 8GB RAM, SSD)
> from 3m47s to 0m3s, and reports 939 broken identifiers in Linux v3.17-rc1;
> 420 additional reports of which 16 are located in Kconfig files,
> 287 in defconfigs, 63 in ./Documentation, 1 in Kbuild.
>
> Moreover, we intentionally include references in comments, which have been
> ignored until now. Such comments may be leftovers of features that have
> been removed or renamed in Kconfig (e.g., ``#endif /* CONFIG_MPC52xx */ÂÂ).
> These references can be misleading and should be removed or replaced.
>
> Note that the output format changed from (file list <tab> feature) to
> (feature <tab> file list) as it simplifies the detection of the Kconfig
> feature for long file lists.
>
> Signed-off-by: Valentin Rothberg <valentinrothberg@xxxxxxxxx>
> Signed-off-by: Stefan Hengelein <stefan.hengelein@xxxxxx>
> ---
> Changelog:
> v2: Fix of regular expressions
> v3: Changelog replacement, and add changes of v2
> v4: Based on comments from Paul Bolle <pebolle@xxxxxxxxxx>
> - Inclusion of all non-Kconfig files, such as .txt, .sh, etc.
> - Changes of regular expressions
> - Increases additional reports from 49 to 229 compared to v3
> - Change of output format from (file list <tab> feature) to
> (feature <tab> file list)Â
> v5: Only analyze files under version control ('git ls-files')
> v6: Cover features with numbers and small letters (e.g., 4xx)
> v7: Add changes of v6 (lost 'git add') and filter FOO/BAR features
> ---
> scripts/checkkconfigsymbols.py | 138 +++++++++++++++++++++++++++++++++++++++++
> scripts/checkkconfigsymbols.sh | 59 ------------------
> 2 files changed, 138 insertions(+), 59 deletions(-)
> create mode 100644 scripts/checkkconfigsymbols.py
> delete mode 100755 scripts/checkkconfigsymbols.sh
>
> diff --git a/scripts/checkkconfigsymbols.py b/scripts/checkkconfigsymbols.py
> new file mode 100644
> index 0000000..01bf9c4
> --- /dev/null
> +++ b/scripts/checkkconfigsymbols.py
> @@ -0,0 +1,138 @@
> +#!/usr/bin/env python
> +
> +"""Find Kconfig identifiers that are referenced but not defined."""
> +
> +# (c) 2014 Valentin Rothberg <valentinrothberg@xxxxxxxxx>
> +# (c) 2014 Stefan Hengelein <stefan.hengelein@xxxxxx>
> +#
> +# Licensed under the terms of the GNU GPL License version 2
> +
> +
> +import os
> +import re
> +from subprocess import Popen, PIPE, STDOUT
> +
> +
> +# regex expressions
> +OPERATORS = r"&|\(|\)|\||\!"
> +FEATURE = r"(?:\w*[A-Z0-9]\w*){2,}"
> +DEF = r"^\s*(?:menu){,1}config\s+(" + FEATURE + r")\s*"
> +EXPR = r"(?:" + OPERATORS + r"|\s|" + FEATURE + r")+"
> +STMT = r"^\s*(?:if|select|depends\s+on)\s+" + EXPR

Could please make that "depends on"? Yes, it seems the yacc grammar
accepts any amount of whitespace, but that doesn't make it right to use
anything other than a single space. (Can the yacc grammar be tweaked to
see "depends on" as one, well, token?)

> +
> +# regex objects
> +REGEX_FILE_KCONFIG = re.compile(r".*Kconfig[\.\w+\-]*$")
> +REGEX_FEATURE = re.compile(r"(" + FEATURE + r")")
> +REGEX_SOURCE_FEATURE = re.compile(r"(?:D|\W|\b)+CONFIG_(" + FEATURE + r")")

That should be "-D", because now you get a hit on
TPS6507X_ADCONFIG_CONVERT_TS

and friends. What does \W do, by the way?

> +REGEX_KCONFIG_DEF = re.compile(DEF)
> +REGEX_KCONFIG_EXPR = re.compile(EXPR)
> +REGEX_KCONFIG_STMT = re.compile(STMT)
> +REGEX_KCONFIG_HELP = re.compile(r"^\s+(help|---help---)\s*$")
> +REGEX_FILTER_FEATURES = re.compile(r"[A-Za-z0-9]$")
> +
> +
> +def main():
> + """Main function of this module."""
> + source_files = []
> + kconfig_files = []
> + defined_features = set()
> + referenced_features = dict() # {feature: [files]}
> +
> + # use 'git ls-files' to get the worklist
> + pop = Popen("git ls-files", stdout=PIPE, stderr=STDOUT, shell=True)
> + (stdout, _) = pop.communicate() # wait until finished
> + if len(stdout) > 0 and stdout[-1] == "\n":
> + stdout = stdout[:-1]
> +
> + for gitfile in stdout.rsplit("\n"):
> + if ".git" in gitfile or "ChangeLog" in gitfile or \
> + os.path.isdir(gitfile):

(The only directories in the git tree are a few symlinks:
arch/arm/boot/dts/include/dt-bindings
arch/metag/boot/dts/include/dt-bindings
arch/mips/boot/dts/include/dt-bindings
arch/powerpc/boot/dts/include/dt-bindings

Filtering out symlinks makes sense anyway. But that's probably not worth
the effort.)

You might also consider filtering out "Next/merge.log". It tends to have
references to Kconfig macros. But it's only a few hits anyway.

> + continue
> + if REGEX_FILE_KCONFIG.match(gitfile):
> + kconfig_files.append(gitfile)
> + else:
> + # all non-Kconfig files are checked for consistency
> + source_files.append(gitfile)
> +
> + for sfile in source_files:
> + parse_source_file(sfile, referenced_features)
> +
> + for kfile in kconfig_files:
> + parse_kconfig_file(kfile, defined_features, referenced_features)
> +
> + print "Undefined symbol used\tFile list"
> + for feature in sorted(referenced_features):
> + # filter some false positives
> + if feature == "FOO" or feature == "BAR" or \
> + feature == "FOO_BAR":
> + continue
> + if feature not in defined_features:
> + if feature.endswith("_MODULE"):
> + # avoid false positives for kernel modules
> + if feature[:-len("_MODULE")] in defined_features:
> + continue
> + files = referenced_features.get(feature)
> + print "%s\t%s" % (feature, ", ".join(files))
> +
> +
> +def parse_source_file(sfile, referenced_features):
> + """Parse @sfile for referenced Kconfig features."""
> + lines = []
> + with open(sfile, "r") as stream:
> + lines = stream.readlines()
> +
> + for line in lines:
> + if not "CONFIG_" in line:
> + continue
> + features = REGEX_SOURCE_FEATURE.findall(line)
> + for feature in features:
> + if not REGEX_FILTER_FEATURES.search(feature):
> + continue
> + sfiles = referenced_features.get(feature, set())
> + sfiles.add(sfile)
> + referenced_features[feature] = sfiles
> +
> +
> +def get_features_in_line(line):
> + """Return mentioned Kconfig features in @line."""
> + return REGEX_FEATURE.findall(line)
> +
> +
> +def parse_kconfig_file(kfile, defined_features, referenced_features):
> + """Parse @kfile and update feature definitions and references."""
> + lines = []
> + skip = False
> +
> + with open(kfile, "r") as stream:
> + lines = stream.readlines()
> +
> + for i in range(len(lines)):
> + line = lines[i]
> + line = line.strip('\n')
> + line = line.split("#")[0] # ignore comments
> +
> + if REGEX_KCONFIG_DEF.match(line):
> + feature_def = REGEX_KCONFIG_DEF.findall(line)
> + defined_features.add(feature_def[0])
> + skip = False
> + elif REGEX_KCONFIG_HELP.match(line):
> + skip = True
> + elif skip:
> + # ignore content of help messages
> + pass
> + elif REGEX_KCONFIG_STMT.match(line):
> + features = get_features_in_line(line)
> + # multi-line statements
> + while line.endswith("\\"):
> + i += 1
> + line = lines[i]
> + line = line.strip('\n')
> + features.extend(get_features_in_line(line))
> + for feature in set(features):
> + paths = referenced_features.get(feature, set())
> + paths.add(kfile)
> + referenced_features[feature] = paths
> +
> +
> +if __name__ == "__main__":
> + main()
>[...]

This seems to find, roughly, what my local perl script finds. It does
skip references to Kconfig macros in the Kconfig help texts and
comments: grep for CONFIG_ITANIUM_ASTEP_SPECIFIC as an example. But
there are so few of those that it's probable not worth the trouble to
check for them too.

A few test show the speedup is especially nice with the entire tree in
cache: run time drops from over four minutes to just under five seconds
here. Provided you look into my comments, this is:

Acked-by: Paul Bolle <pebolle@xxxxxxxxxx>

Thanks,


Paul Bolle

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/