Regular expressions are used for Pattern Matching.
To use in Excel follow these steps:
Step 1: Add VBA reference to "Microsoft VBScript Regular Expressions 5.5"
- Select "Developer" tab (I don't have this tab what do I do?)
- Select "Visual Basic" icon from 'Code' ribbon section
- In "Microsoft Visual Basic for Applications" window select "Tools" from the top menu.
- Select "References"
- Check the box next to "Microsoft VBScript Regular Expressions 5.5" to include in your workbook.
- Click "OK"
Step 2: Define your pattern
Basic definitions:
-
Range.
- E.g.
a-z
matches an lower case letters from a to z
- E.g.
0-5
matches any number from 0 to 5
[]
Match exactly one of the objects inside these brackets.
- E.g.
[a]
matches the letter a
- E.g.
[abc]
matches a single letter which can be a, b or c
- E.g.
[a-z]
matches any single lower case letter of the alphabet.
()
Groups different matches for return purposes. See examples below.
{}
Multiplier for repeated copies of pattern defined before it.
- E.g.
[a]{2}
matches two consecutive lower case letter a: aa
- E.g.
[a]{1,3}
matches at least one and up to three lower case letter a
, aa
, aaa
+
Match at least one, or more, of the pattern defined before it.
- E.g.
a+
will match consecutive a's a
, aa
, aaa
, and so on
?
Match zero or one of the pattern defined before it.
- E.g. Pattern may or may not be present but can only be matched one time.
- E.g.
[a-z]?
matches empty string or any single lower case letter.
*
Match zero or more of the pattern defined before it.
- E.g. Wildcard for pattern that may or may not be present.
- E.g.
[a-z]*
matches empty string or string of lower case letters.
.
Matches any character except newline \n
- E.g.
a.
Matches a two character string starting with a and ending with anything except \n
|
OR operator
- E.g.
a|b
means either a
or b
can be matched.
- E.g.
red|white|orange
matches exactly one of the colors.
^
NOT operator
- E.g.
[^0-9]
character can not contain a number
- E.g.
[^aA]
character can not be lower case a
or upper case A
\
Escapes special character that follows (overrides above behavior)
- E.g.
\.
, \\
, \(
, \?
, \$
, \^
Anchoring Patterns:
^
Match must occur at start of string
- E.g.
^a
First character must be lower case letter a
- E.g.
^[0-9]
First character must be a number.
$
Match must occur at end of string
- E.g.
a$
Last character must be lower case letter a
Precedence table:
Order Name Representation
1 Parentheses ( )
2 Multipliers ? + * {m,n} {m, n}?
3 Sequence & Anchors abc ^ $
4 Alternation |
Predefined Character Abbreviations:
abr same as meaning
\d [0-9] Any single digit
\D [^0-9] Any single character that's not a digit
\w [a-zA-Z0-9_] Any word character
\W [^a-zA-Z0-9_] Any non-word character
\s [ \r\t\n\f] Any space character
\S [^ \r\t\n\f] Any non-space character
\n [\n] New line
Example 1: Run as macro
The following example macro looks at the value in cell A1
to see if the first 1 or 2 characters are digits. If so, they are removed and the rest of the string is displayed. If not, then a box appears telling you that no match is found. Cell A1
values of 12abc
will return abc
, value of 1abc
will return abc
, value of abc123
will return "Not Matched" because the digits were not at the start of the string.
Private Sub simpleRegex()
Dim strPattern As String: strPattern = "^[0-9]{1,2}"
Dim strReplace As String: strReplace = ""
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("A1")
If strPattern <> "" Then
strInput = Myrange.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
MsgBox (regEx.Replace(strInput, strReplace))
Else
MsgBox ("Not matched")
End If
End If
End Sub
Example 2: Run as an in-cell function
This example is the same as example 1 but is setup to run as an in-cell function. To use, change the code to this:
Function simpleCellRegex(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
strPattern = "^[0-9]{1,3}"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
simpleCellRegex = regEx.Replace(strInput, strReplace)
Else
simpleCellRegex = "Not matched"
End If
End If
End Function
Place your strings ("12abc") in cell A1
. Enter this formula =simpleCellRegex(A1)
in cell B1
and the result will be "abc".
Example 3: Loop Through Range
This example is the same as example 1 but loops through a range of cells.
Private Sub simpleRegex()
Dim strPattern As String: strPattern = "^[0-9]{1,2}"
Dim strReplace As String: strReplace = ""
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("A1:A5")
For Each cell In Myrange
If strPattern <> "" Then
strInput = cell.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
MsgBox (regEx.Replace(strInput, strReplace))
Else
MsgBox ("Not matched")
End If
End If
Next
End Sub
Example 4: Splitting apart different patterns
This example loops through a range (A1
, A2
& A3
) and looks for a string starting with three digits followed by a single alpha character and then 4 numeric digits. The output splits apart the pattern matches into adjacent cells by using the ()
. $1
represents the first pattern matched within the first set of ()
.
Private Sub splitUpRegexPattern()
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("A1:A3")
For Each C In Myrange
strPattern = "(^[0-9]{3})([a-zA-Z])([0-9]{4})"
If strPattern <> "" Then
strInput = C.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
C.Offset(0, 1) = regEx.Replace(strInput, "$1")
C.Offset(0, 2) = regEx.Replace(strInput, "$2")
C.Offset(0, 3) = regEx.Replace(strInput, "$3")
Else
C.Offset(0, 1) = "(Not matched)"
End If
End If
Next
End Sub
Results:
Additional Pattern Examples
String Regex Pattern Explanation
a1aaa [a-zA-Z][0-9][a-zA-Z]{3} Single alpha, single digit, three alpha characters
a1aaa [a-zA-Z]?[0-9][a-zA-Z]{3} May or may not have preceding alpha character
a1aaa [a-zA-Z][0-9][a-zA-Z]{0,3} Single alpha, single digit, 0 to 3 alpha characters
a1aaa [a-zA-Z][0-9][a-zA-Z]* Single alpha, single digit, followed by any number of alpha characters
</i8> \<\/[a-zA-Z][0-9]\> Exact non-word character except any single alpha followed by any single digit
Some examples of how to avoid select
Use Dim
'd variables
Dim rng as Range
Set
the variable to the required range. There are many ways to refer to a single-cell range:
Set rng = Range("A1")
Set rng = Cells(1, 1)
Set rng = Range("NamedRange")
Or a multi-cell range:
Set rng = Range("A1:B10")
Set rng = Range("A1", "B10")
Set rng = Range(Cells(1, 1), Cells(10, 2))
Set rng = Range("AnotherNamedRange")
Set rng = Range("A1").Resize(10, 2)
You can use the shortcut to the Evaluate
method, but this is less efficient and should generally be avoided in production code.
Set rng = [A1]
Set rng = [A1:B10]
All the above examples refer to cells on the active sheet. Unless you specifically want to work only with the active sheet, it is better to Dim a Worksheet
variable too:
Dim ws As Worksheet
Set ws = Worksheets("Sheet1")
Set rng = ws.Cells(1, 1)
With ws
Set rng = .Range(.Cells(1, 1), .Cells(2, 10))
End With
If you do want to work with the ActiveSheet
, for clarity it's best to be explicit. But take care, as some Worksheet
methods change the active sheet.
Set rng = ActiveSheet.Range("A1")
Again, this refers to the active workbook. Unless you specifically want to work only with the ActiveWorkbook
or ThisWorkbook
, it is better to Dim a Workbook
variable too.
Dim wb As Workbook
Set wb = Application.Workbooks("Book1")
Set rng = wb.Worksheets("Sheet1").Range("A1")
If you do want to work with the ActiveWorkbook
, for clarity it's best to be explicit. But take care, as many WorkBook
methods change the active book.
Set rng = ActiveWorkbook.Worksheets("Sheet1").Range("A1")
You can also use the ThisWorkbook
object to refer to the book containing the running code.
Set rng = ThisWorkbook.Worksheets("Sheet1").Range("A1")
A common (bad) piece of code is to open a book, get some data then close again
This is bad:
Sub foo()
Dim v as Variant
Workbooks("Book1.xlsx").Sheets(1).Range("A1").Clear
Workbooks.Open("C:\Path\To\SomeClosedBook.xlsx")
v = ActiveWorkbook.Sheets(1).Range("A1").Value
Workbooks("SomeAlreadyOpenBook.xlsx").Activate
ActiveWorkbook.Sheets("SomeSheet").Range("A1").Value = v
Workbooks(2).Activate
ActiveWorkbook.Close()
End Sub
And it would be better like:
Sub foo()
Dim v as Variant
Dim wb1 as Workbook
Dim wb2 as Workbook
Set wb1 = Workbooks("SomeAlreadyOpenBook.xlsx")
Set wb2 = Workbooks.Open("C:\Path\To\SomeClosedBook.xlsx")
v = wb2.Sheets("SomeSheet").Range("A1").Value
wb1.Sheets("SomeOtherSheet").Range("A1").Value = v
wb2.Close()
End Sub
Pass ranges to your Sub
s and Function
s as Range variables:
Sub ClearRange(r as Range)
r.ClearContents
'....
End Sub
Sub MyMacro()
Dim rng as Range
Set rng = ThisWorkbook.Worksheets("SomeSheet").Range("A1:B10")
ClearRange rng
End Sub
You should also apply Methods (such as Find
and Copy
) to variables:
Dim rng1 As Range
Dim rng2 As Range
Set rng1 = ThisWorkbook.Worksheets("SomeSheet").Range("A1:A10")
Set rng2 = ThisWorkbook.Worksheets("SomeSheet").Range("B1:B10")
rng1.Copy rng2
If you are looping over a range of cells it is often better (faster) to copy the range values to a variant array first and loop over that:
Dim dat As Variant
Dim rng As Range
Dim i As Long
Set rng = ThisWorkbook.Worksheets("SomeSheet").Range("A1:A10000")
dat = rng.Value ' dat is now array (1 to 10000, 1 to 1)
for i = LBound(dat, 1) to UBound(dat, 1)
dat(i,1) = dat(i, 1) * 10 ' Or whatever operation you need to perform
next
rng.Value = dat ' put new values back on sheet
This is a small taster for what's possible.
Best Answer
Thanks for the fun challenge (love recursion :)
Recursion in Excel has some limitations, but this might work for your needs.
Please define a named formula (
all_my_children
) in Name Manager (we need to use it for recursive formulas) in the example I have created a table for the data namedTree
.all_my_children=
And then to get the result, (entered in
E1
in the example):Please let know know if any explanation is needed.