Here is one way you could do this seeing how &
is mixed in and not a word character ...
x <- c('A B C Company', 'XYZ Inc', 'S & K Co', 'A B C D E F G Company')
gsub('(?<!\\S\\S)\\s+(?=\\S(?!\\S))', '', x, perl=TRUE)
# [1] "ABC Company" "XYZ Inc" "S&K Co" "ABCDEFG Company"
Explanation:
First we assert that two non-whitespace characters do not precede back to back. Then we look for and match whitespace "one or more" times. Next we lookahead to assert that a non-whitespace character follows while asserting that the next character is not a non-whitespace character.
(?<! # look behind to see if there is not:
\S # non-whitespace (all but \n, \r, \t, \f, and " ")
\S # non-whitespace (all but \n, \r, \t, \f, and " ")
) # end of look-behind
\s+ # whitespace (\n, \r, \t, \f, and " ") (1 or more times)
(?= # look ahead to see if there is:
\S # non-whitespace (all but \n, \r, \t, \f, and " ")
(?! # look ahead to see if there is not:
\S # non-whitespace (all but \n, \r, \t, \f, and " ")
) # end of look-ahead
) # end of look-ahead
Best Answer
In general, we want a solution that is vectorised, so here's a better test example:
The base R approach:
gsub
gsub
replaces all instances of a string (fixed = TRUE
) or regular expression (fixed = FALSE
, the default) with another string. To remove all spaces, use:As DWin noted, in this case
fixed = TRUE
isn't necessary but provides slightly better performance since matching a fixed string is faster than matching a regular expression.If you want to remove all types of whitespace, use:
"[:space:]"
is an R-specific regular expression group matching all space characters.\s
is a language-independent regular-expression that does the same thing.The
stringr
approach:str_replace_all
andstr_trim
stringr
provides more human-readable wrappers around the base R functions (though as of Dec 2014, the development version has a branch built on top ofstringi
, mentioned below). The equivalents of the above commands, using [str_replace_all][3]
, are:stringr
also has astr_trim
function which removes only leading and trailing whitespace.The
stringi
approach:stri_replace_all_charclass
andstri_trim
stringi
is built upon the platform-independent ICU library, and has an extensive set of string manipulation functions. The equivalents of the above are:Here
"\\p{WHITE_SPACE}"
is an alternate syntax for the set of Unicode code points considered to be whitespace, equivalent to"[[:space:]]"
,"\\s"
andspace()
. For more complex regular expression replacements, there is alsostri_replace_all_regex
.stringi
also has trim functions.