Ruby Regex – Differences Between Lazy, Greedy, and Possessive Quantifiers

regexrubyruby-1.9.3

How do the following quantifiers differ – with respect of scenarios, speed, etc.

  • ?, ?? and ?+ all match 0 or 1 times.
  • *, *? and*+` all match 0 or more times.
  • +, +? and ++ all match 1 or more times.

  • ?, * and + are greedy.
  • ??, *? and +? are reluctant/lazy.
  • ?+, *+ and ++ are possessive.

Can anyone help me to understand what these terms mean? Why are there three variations of each quantifier for the same job?

Best Answer

Take the string

aaaab

and see how the following regexes match it:

Regex          Submatches
               group 1  group 2  group3
(a?)(a*)(ab)   a        aa       ab
(a??)(a*)(ab)           aaa      ab
(a?+)(a*)(ab)  a        aa       ab
(a*)(a?)(ab)   aaa               ab
(a*?)(a?)(ab)  aa       a        ab
(a*+)(a?)(ab)  aaaa              <Match fails!>
(a+)(a*)(ab)   aaa               ab 
(a+?)(a*)(ab)  a        aa       ab
(a++)(a*)(ab)  aaaa              <Match fails!>

Explanation:

  • a? tries to match one a, but it's prepared to match nothing if that's necessary for the whole match to succeed.
  • a?? tries to match nothing, but it's prepared to match one a if that's necessary for the whole match to succeed.
  • a?+ tries to match one a. If it can do that, it will not back down to match nothing if that were necessary for the overall match to succeed. If it can't match an a, then it will gladly match nothing, though.
  • a* tries to match as many as as it can, but it's prepared to match fewer as, even nothing if that's necessary for the whole match to succeed.
  • a*? tries to match nothing, but it's prepared to match just as many as as is absolutely necessary in order for the whole match to succeed, but not more.
  • a*+ tries to match as many as as it can. If it can do that, it will not back down to match fewer as if that were necessary for the overall match to succeed. If it can't match even a single a, then it will gladly match nothing, though.
  • a+ tries to match as many as as it can, but it's prepared to match fewer as (but at least one) if that's necessary for the whole match to succeed.
  • a+? tries to match only one a, but it's prepared to match just as many as as is absolutely necessary in order for the whole match to succeed, but not more.
  • a++ tries to match as many as as it can. If it can do that, it will not back down to match fewer as if that were necessary for the overall match to succeed. If it can't match even a single a, then the regex fails.