How can I use regular expressions in Excel and take advantage of Excel's powerful grid-like setup for data manipulation?
- In-cell function to return a matched pattern or replaced value in a string.
- Sub to loop through a column of data and extract matches to adjacent cells.
- What setup is necessary?
- What are Excel's special characters for Regular expressions?
I understand Regex is not ideal for many situations (To use or not to use regular expressions?) since excel can use Left
, Mid
, Right
, Instr
type commands for similar manipulations.
Best Answer
Regular expressions are used for Pattern Matching.
To use in Excel follow these steps:
Step 1: Add VBA reference to "Microsoft VBScript Regular Expressions 5.5"
Step 2: Define your pattern
Basic definitions:
-
Range.a-z
matches an lower case letters from a to z0-5
matches any number from 0 to 5[]
Match exactly one of the objects inside these brackets.[a]
matches the letter a[abc]
matches a single letter which can be a, b or c[a-z]
matches any single lower case letter of the alphabet.()
Groups different matches for return purposes. See examples below.{}
Multiplier for repeated copies of pattern defined before it.[a]{2}
matches two consecutive lower case letter a:aa
[a]{1,3}
matches at least one and up to three lower case lettera
,aa
,aaa
+
Match at least one, or more, of the pattern defined before it.a+
will match consecutive a'sa
,aa
,aaa
, and so on?
Match zero or one of the pattern defined before it.[a-z]?
matches empty string or any single lower case letter.*
Match zero or more of the pattern defined before it.[a-z]*
matches empty string or string of lower case letters..
Matches any character except newline\n
a.
Matches a two character string starting with a and ending with anything except\n
|
OR operatora|b
means eithera
orb
can be matched.red|white|orange
matches exactly one of the colors.^
NOT operator[^0-9]
character can not contain a number[^aA]
character can not be lower casea
or upper caseA
\
Escapes special character that follows (overrides above behavior)\.
,\\
,\(
,\?
,\$
,\^
Anchoring Patterns:
^
Match must occur at start of string^a
First character must be lower case lettera
^[0-9]
First character must be a number.$
Match must occur at end of stringa$
Last character must be lower case lettera
Precedence table:
Predefined Character Abbreviations:
Example 1: Run as macro
The following example macro looks at the value in cell
A1
to see if the first 1 or 2 characters are digits. If so, they are removed and the rest of the string is displayed. If not, then a box appears telling you that no match is found. CellA1
values of12abc
will returnabc
, value of1abc
will returnabc
, value ofabc123
will return "Not Matched" because the digits were not at the start of the string.Example 2: Run as an in-cell function
This example is the same as example 1 but is setup to run as an in-cell function. To use, change the code to this:
Place your strings ("12abc") in cell
A1
. Enter this formula=simpleCellRegex(A1)
in cellB1
and the result will be "abc".Example 3: Loop Through Range
This example is the same as example 1 but loops through a range of cells.
Example 4: Splitting apart different patterns
This example loops through a range (
A1
,A2
&A3
) and looks for a string starting with three digits followed by a single alpha character and then 4 numeric digits. The output splits apart the pattern matches into adjacent cells by using the()
.$1
represents the first pattern matched within the first set of()
.Results:
Additional Pattern Examples