Re: [RFC] globmatch() helper function

From: George Spelvin
Date: Thu Dec 18 2008 - 03:55:42 EST


Eureka!

Never mind all of this angst; I've figured out a non-recursive way
to do it. Thanks for pushing me to think a bit harder and find it.
(I was actually looking for an example of the exponential pathology I
kept referring to when it dawned on me that it's actually impossible
without the additional expressive power of regexps.)

Somewhat abusing TeX's line-breaking terminology, consider a shell glob to
consist of a series of "boxes" and "glue" A box is a run of character
classes (of which literal characters and ? are degenerate cases). The
point is, a box has a well-defined width, the number of characters in the
string that it must match.

A *, on the other hand, is infinitely elastic "glue" between boxes that
matches an unknown number of characters.

So consider a pattern of the form box1*box2*box3*...
Assuming box1 has matched, we search forward, trying various widths of
the glue *, to find a point where box2 matches. Then we start searching
for the width of the second *. But! If we can't match the tail of the
pattern box3*... at any position to the right of the end of box2, there
is no point trying to move box2 forward and search again. Any solution
where box2*box3*... would match starting k characters later in the string
would also be a match with box2 (only) shifted k characters left, i.e.
in its original position.

So we can match boxes greedily: once we've found the leftmost place
where one matches (that is still to the right of all previous boxes),
we never need to try any other positions. Moving a box to the right
can never produce a match which doesn't exist in its previous position.

I'll rewrite it to be non-recusrive and quadratic in the worst case.


Thanks; I feel stupid for not maving looked this up, but smart for having
thought of it myself.

(For a discussion of how to get exponential backtracking in a regular
expression, see http://swtch.com/~rsc/regexp/regexp1.html)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/