Regex – What Do ‘Lazy’ and ‘Greedy’ Mean in Regular Expressions?

non-greedyregexregex-greedy

What are these two terms in an understandable way?

Best Answer

Greedy will consume as much as possible. From http://www.regular-expressions.info/repeat.html we see the example of trying to match HTML tags with <.+>. Suppose you have the following:

<em>Hello World</em>

You may think that <.+> (. means any non newline character and + means one or more) would only match the <em> and the </em>, when in reality it will be very greedy, and go from the first < to the last >. This means it will match <em>Hello World</em> instead of what you wanted.

Making it lazy (<.+?>) will prevent this. By adding the ? after the +, we tell it to repeat as few times as possible, so the first > it comes across, is where we want to stop the matching.

I'd encourage you to download RegExr, a great tool that will help you explore Regular Expressions - I use it all the time.

Related Solutions

Regex Lazy vs Greedy – Is Lazy Worse?

Another thing to consider is how long the target text is, and how much of it is going to be matched by the quantified subexpression. For example, if you were trying to match the whole <BODY> element in a large HTML document, you might be tempted to use this regex:

/<BODY>.*?<\/BODY>/is

But that's going to do a whole lot of unnecessary work, matching one character at a time while effectively doing a negative lookahead before each one. You know the </BODY> tag is going to be very near the end of the document, so the smart thing to do is to use a normal greedy quantitier; let it slurp up the whole rest of the document and then backtrack the few characters necessary to match the end tag.

In most cases you won't notice any speed difference between greedy and reluctant quantifiers, but it's something to keep in mind. The main reason why you should be judicious in your use of reluctant quantifiers is the one that was pointed out by the others: they may do it reluctantly, but they will match more than you want them to if that's what it takes to achieve an overall match.

C# Regex – Greedy, Non-Greedy, and All-Greedy Matching Explained

You could use something like:

MatchCollection nonGreedyMatches = Regex.Matches("abcd", @"(((ab)c)d)");

Then you should have three backreferences with ab, abc and abcd.

But, to be honest, this kind of regex doesn't makes too much sense, especially when it gets bigger it becomes unreadable.

Edit:

MatchCollection nonGreedyMatches = Regex.Matches("abcd", @"ab.?");

And you got an error there btw. This can only match ab and abc (read: ab + any (optional) character

Lazy version of:

MatchCollection greedyMatches    = Regex.Matches("abcd", @"ab.*");

is:

MatchCollection nonGreedyMatches    = Regex.Matches("abcd", @"ab.*?");

Best Answer

Related Solutions

Regex Lazy vs Greedy – Is Lazy Worse?

C# Regex – Greedy, Non-Greedy, and All-Greedy Matching Explained

Related Question