I am trying to match all characters except the ones mentioned in the exclusion group, the symbol in the subject "some–text" is not a Hyphen / Minus sign its an En Dash Unicode Character “–” (U+2013)
preg_match("/[^↓a-zA-Z0-9" . preg_quote(".\\+*?[^]$(){}=!<>|:#") . "~@%&_;'\",\\/ \r\nαβγδθλμπφΔΦΩØ°±≤≥↑∞⁰¹²³⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉]/", "some–text");
The above code does not work as expected, returns 0 instead of 1
preg_match("/[^↓a-zA-Z0-9" . preg_quote(".\\+*?[^]$(){}=!<>|:#") . "~@%&_;'\",\\/ \r\nαβγδθλμπφΔΦΩØ°±≤≥↑∞⁰¹²³⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉]/", "some`text");
In the above code if I change the symbol in the subject to back tick ` it works and returns 1
preg_match("/[^a-zA-Z0-9" . preg_quote(".\+*?[^]$(){}=!<>|:#") . "~@%&_;'",\/ \r\nαβγδθλμπφΔΦΩØ°±≤≥↑∞⁰¹²³⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉]/", "some–text");
If I remove the Downwards Arrow ↓ from pattern and keep the En Dash "–" in Subject, it starts working and the above code returns 1
preg_match("/[^↓]/", "some–text");
If I remove all other characters and keep only Downwards Arrow ↓ in the exclusion group, it works and return 1, so it not the Downwards Arrow symbol causing the problem
Can somebody tell me what is going on, I just want to match all characters except these
a-zA-Z0-9~@%&_;'\",/ \r\nαβγδθλμπφΔΦΩØ°±≤≥↓↑∞⁰¹²³⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉
which includes space and newline characters \r and \n
Best Answer
To handle the Unicode characters in PHP, you must use the 'u' modifier in your Regular Expression. Also, 'preg_quote' function escapes special characters in the exclusion list to ensure they are treated as literals in the regex pattern.
The following code will help you to match all the characters except the ones mentioned in the exclusion group including the en dash.