C# Regular Expressions – Using Groups Effectively

c++regex

I've inherited a code block that contains the following regex and I'm trying to understand how it's getting its results.

var pattern = @"\[(.*?)\]";
var matches = Regex.Matches(user, pattern);
if (matches.Count > 0 && matches[0].Groups.Count > 1)
    ...

For the input user == "Josh Smith [jsmith]":

matches.Count == 1
matches[0].Value == "[jsmith]"

… which I understand. But then:

matches[0].Groups.Count == 2
matches[0].Groups[0].Value == "[jsmith]"
matches[0].Groups[1].Value == "jsmith" <=== how?

Looking at this question from what I understand the Groups collection stores the entire match as well as the previous match. But, doesn't the regexp above match only for [open square bracket] [text] [close square bracket] so why would "jsmith" match?

Also, is it always the case the the groups collection will store exactly 2 groups: the entire match and the last match?

Best Answer

  • match.Groups[0] is always the same as match.Value, which is the entire match.
  • match.Groups[1] is the first capturing group in your regular expression.

Consider this example:

var pattern = @"\[(.*?)\](.*)";
var match = Regex.Match("ignored [john] John Johnson", pattern);

In this case,

  • match.Value is "[john] John Johnson"
  • match.Groups[0] is always the same as match.Value, "[john] John Johnson".
  • match.Groups[1] is the group of captures from the (.*?).
  • match.Groups[2] is the group of captures from the (.*).
  • match.Groups[1].Captures is yet another dimension.

Consider another example:

var pattern = @"(\[.*?\])+";
var match = Regex.Match("[john][johnny]", pattern);

Note that we are looking for one or more bracketed names in a row. You need to be able to get each name separately. Enter Captures!

  • match.Groups[0] is always the same as match.Value, "[john][johnny]".
  • match.Groups[1] is the group of captures from the (\[.*?\])+. The same as match.Value in this case.
  • match.Groups[1].Captures[0] is the same as match.Groups[1].Value
  • match.Groups[1].Captures[1] is [john]
  • match.Groups[1].Captures[2] is [johnny]
Related Question