Re: Subject: [RFC] clang tooling cleanups

From: Tom Rix
Date: Tue Oct 27 2020 - 15:34:09 EST



On 10/27/20 11:42 AM, Nick Desaulniers wrote:
> (cutting down the CC list to something more intimate)
>
> Tom, I'm over the moon to see clang-tidy being used this way. I
> totally forgot it could automatically apply fixits. I'm glad Nathan
> and Masahiro were able to get the foundation laid for running
> clang-tidy on the kernel this summer.
>
> On Tue, Oct 27, 2020 at 9:43 AM <trix@xxxxxxxxxx> wrote:
>> This rfc will describe
>> An upcoming treewide cleanup.
>> How clang tooling was used to programatically do the clean up.
>> Solicit opinions on how to generally use clang tooling.
>>
>> The clang warning -Wextra-semi-stmt produces about 10k warnings.
>> Reviewing these, a subset of semicolon after a switch looks safe to
>> fix all the time. An example problem
>>
>> void foo(int a) {
>> switch(a) {
>> case 1:
>> ...
>> }; <--- extra semicolon
>> }
>>
>> Treewide, there are about 100 problems in 50 files for x86_64 allyesconfig.
>> These fixes will be the upcoming cleanup.
>>
>> clang already supports fixing this problem. Add to your command line
>>
>> clang -c -Wextra-semi-stmt -Xclang -fixit foo.c
>>
>> foo.c:8:3: warning: empty expression statement has no effect;
>> remove unnecessary ';' to silence this warning [-Wextra-semi-stmt]
>> };
>> ^
>> foo.c:8:3: note: FIX-IT applied suggested code changes
>> 1 warning generated.
> Ah, doesn't that rely on clang-tidy to apply the fixit? (oh, what,
> maybe not: https://stackoverflow.com/a/49749277)
>
> And doesn't that require your patch to clang-tidy to land?
> https://reviews.llvm.org/D90180
>
> Now I'm confused; if clang can apply the fixit for warnings, why do we
> need another patch to clang-tidy?

No, this shows where the fixer is upstream.

I am in the process of pushing out the patches.

Long term the clang-tidy part of the build will change once it lands.

globbing the checker to -checker=-*,linuxkernel* would be easiest on the kernel

but that may not be where the checker lands.

>> The big problem is using this treewide is it will fix all 10k problems.
>> 10k changes to analyze and upstream is not practical.
>>
>> Another problem is the generic fixer only removes the semicolon.
>> So empty lines with some tabs need to be manually cleaned.
>>
>> What is needed is a more precise fixer.
>>
>> Enter clang-tidy.
>> https://clang.llvm.org/extra/clang-tidy/
>>
>> Already part of the static checker infrastructure, invoke on the clang
>> build with
>> make clang-tidy
>>
>> It is only a matter of coding up a specific checker for the cleanup.
>> Upstream this is review is happening here
>> https://reviews.llvm.org/D90180
> Sorry, I still don't understand how the clang-tidy checker wont also
> produce 10k fixes?

I am interested in treewide fixes.

Cleaning them up (maybe me not doing all the patches) and keeping them clean.

The clang -Wextra-semi-stmt -fixit will fix all 10,000 problems

This clang tidy fixer will fix only the 100 problems that are 'switch() {};'

When doing a treewide cleanup, batching a bunch of fixes that are the same problem and fix

is much easier on everyone to review and more likely to be accepted.


Long term, a c/i system would keep the tree clean by running the switch-semi checker/fixer.

And we can all move onto the next problem.

>
>> The development of a checker/fixer is
>> Start with a reproducer
>>
>> void foo (int a) {
>> switch (a) {};
>> }
>>
>> Generate the abstract syntax tree (AST)
>>
>> clang -Xclang -ast-dump foo.c
>>
>> `-FunctionDecl
>> |-ParmVarDecl
>> `-CompoundStmt
>> |-SwitchStmt
>> | |-ImplicitCastExpr
>> | | `-DeclRefExpr
>> | `-CompoundStmt
>> `-NullStmt
>>
>> Write a matcher to get you most of the way
>>
>> void SwitchSemiCheck::registerMatchers(MatchFinder *Finder) {
>> Finder->addMatcher(
>> compoundStmt(has(switchStmt().bind("switch"))).bind("comp"), this);
>> }
>>
>> The 'bind' method is important, it allows a string to be associated
>> with a node in the AST. In this case these are
>>
>> `-FunctionDecl
>> |-ParmVarDecl
>> `-CompoundStmt <-------- comp
>> |-SwitchStmt <-------- switch
>> | |-ImplicitCastExpr
>> | | `-DeclRefExpr
>> | `-CompoundStmt
>> `-NullStmt
>>
>> When a match is made the 'check' method will be called.
>>
>> void SwitchSemiCheck::check(const MatchFinder::MatchResult &Result) {
>> auto *C = Result.Nodes.getNodeAs<CompoundStmt>("comp");
>> auto *S = Result.Nodes.getNodeAs<SwitchStmt>("switch");
>>
>> This is where the string in the bind calls are changed to nodes
>>
>> `-FunctionDecl
>> |-ParmVarDecl
>> `-CompoundStmt <-------- comp, C
>> |-SwitchStmt <-------- switch, S
>> | |-ImplicitCastExpr
>> | | `-DeclRefExpr
>> | `-CompoundStmt
>> `-NullStmt <---------- looking for N
>>
>> And then more logic to find the NullStmt
>>
>> auto Current = C->body_begin();
>> auto Next = Current;
>> Next++;
>> while (Next != C->body_end()) {
>> if (*Current == S) {
>> if (const auto *N = dyn_cast<NullStmt>(*Next)) {
>>
>> When it is found, a warning is printed and a FixItHint is proposed.
>>
>> auto H = FixItHint::CreateReplacement(
>> SourceRange(S->getBody()->getEndLoc(), N->getSemiLoc()), "}");
>> diag(N->getSemiLoc(), "unneeded semicolon") << H;
>>
>> This fixit replaces from the end of switch to the semicolon with a
>> '}'. Because the end of the switch is '}' this has the effect of
>> removing all the whitespace as well as the semicolon.
>>
>> Because of the checker's placement in clang-tidy existing linuxkernel
>> checkers, all that was needed to fix the tree was to add a '-fix'to the
>> build's clang-tidy call.
> I wonder if there's a way to differentiate existing checks we'd prefer
> to run continuously vs newer noisier ones? Drowning in a sea of 10k
> -Wextra-semi-stmt doesn't sound like fun. Maybe a new target for make
> to differentiate reporting vs auto fixing?
>
>> I am looking for opinions on what we want to do specifically with
>> cleanups and generally about other source-to-source programmatic
>> changes to the code base.
>>
>> For cleanups, I think we need a new toplevel target
>>
>> clang-tidy-fix
> ah, yep, I agree. Though I'm curious now that I know that clang can
> be used as the driver to apply fixits rather than clang-tidy, how else
> we can leverage clang over manually writing clang-tidy checks. Unless
> I have something confused there?

Nope.

IMO clang fixits are too coarse and will never work treewide.

Comparing my recent treewide fixing of unneeded breaks and semi's, I would much rather write a tool

than manually look at or fix anything treewide.

Tom

>
>> And an explicit list of fixers that have a very high (100%?) fix rate.
>>
>> Ideally a bot should make the changes, but a bot could also nag folks.
>> Is there interest in a bot making the changes? Does one already exist?
> Most recently Joe sent a treewide fix for section attributes that
> Linux pulled just after the merge window closed, IIUC. Maybe that
> would be the best time, since automation makes it trivial for anyone
> to run the treewide fixit whenever.
>
>> The general source-to-source is a bit blue sky. Ex/ could automagicly
>> refactor api, outline similar cut-n-pasted functions etc. Anything on
>> someone's wishlist you want to try out ?
>>
>> Signed-off-by: Tom Rix <trix@xxxxxxxxxx>
>>
>> --