PHP Preg_match – Solving Issues with Symbols in Patterns

phppreg-match

I am trying to match all characters except the ones mentioned in the exclusion group, the symbol in the subject "some–text" is not a Hyphen / Minus sign its an En Dash Unicode Character “–” (U+2013)

preg_match("/[^↓a-zA-Z0-9" . preg_quote(".\\+*?[^]$(){}=!<>|:#") . "~@%&_;'\",\\/ \r\nαβγδθλμπφΔΦΩØ°±≤≥↑∞⁰¹²³⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉]/", "some–text");

The above code does not work as expected, returns 0 instead of 1

preg_match("/[^↓a-zA-Z0-9" . preg_quote(".\\+*?[^]$(){}=!<>|:#") . "~@%&_;'\",\\/ \r\nαβγδθλμπφΔΦΩØ°±≤≥↑∞⁰¹²³⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉]/", "some`text");

In the above code if I change the symbol in the subject to back tick ` it works and returns 1

preg_match("/[^a-zA-Z0-9" . preg_quote(".\+*?[^]$(){}=!<>|:#") . "~@%&_;'",\/ \r\nαβγδθλμπφΔΦΩØ°±≤≥↑∞⁰¹²³⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉]/", "some–text");

If I remove the Downwards Arrow ↓ from pattern and keep the En Dash "–" in Subject, it starts working and the above code returns 1

preg_match("/[^↓]/", "some–text");

If I remove all other characters and keep only Downwards Arrow ↓ in the exclusion group, it works and return 1, so it not the Downwards Arrow symbol causing the problem

Can somebody tell me what is going on, I just want to match all characters except these

a-zA-Z0-9~@%&_;'\",/ \r\nαβγδθλμπφΔΦΩØ°±≤≥↓↑∞⁰¹²³⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉

which includes space and newline characters \r and \n

Best Answer

To handle the Unicode characters in PHP, you must use the 'u' modifier in your Regular Expression. Also, 'preg_quote' function escapes special characters in the exclusion list to ensure they are treated as literals in the regex pattern.

The following code will help you to match all the characters except the ones mentioned in the exclusion group including the en dash.

$exclusions = "a-zA-Z0-9~@%&_;'\",/ \r\nαβγδθλμπφΔΦΩØ°±≤≥↓↑∞⁰¹²³⁴⁵⁶⁷₈₉₀–";
$pattern = "/[^" . preg_quote($exclusions, '/') . "]/u";
$subject = "some–text";

if (preg_match($pattern, $subject)) {
    echo "Matched";
} else {
    echo "Not matched";
}

Related Question