Regex – How to Interpret the “@” Symbol Before Regex in Awk on Linux

awklinuxregex

For the input file patterns.txt, filter lines containing three or more
occurrences of "ar" and replace the last but second "ar" with "X"

par car tar far Cart

part cart mart

Expected output

par car tX far Cart

pXt cart mart

awk 'BEGIN{r = @/(.*)ar((.*ar){2})/} $0~r{print gensub(r, "\\1X\\2", 1)}' patterns.txt

There is one think i cannot understand. What is the "@" means in BEGIN block?

Best Answer

A little bit of background re: standard regex constants.

Without the @ prefix:

##### this:

r = /(.*)ar((.*ar){2})/

##### is comparable to this:

r = ($0 ~ /(.*)ar((.*ar){2})/)      # assign 'r' the value of the comparison, ie, r = 0 (false) or 1 (true)

NOTE: if r = /(.*)ar((.*ar){2})/ is performed in the BEGIN block (where $0 is undefined) you'll always end up with r = 0 (false)

The obvious objective of this line of code is to assign a regex pattern to the variable r for use later in the script.

In GNU awk there are a couple approaches for assigning a regex pattern to a variable:

dynamic regexps: r = "(.*)ar((.*ar){2})"
strongly typed regex constant: r = @/(.*)ar((.*ar){2})/

So, to answer OP's question, r = @/.../ is one approach (strongly typed regex constant) available in GNU awk for assigning a regexp to a variable.

Related Solutions

AWK – Print All Columns from nth to Last

Print all columns:

awk '{print $0}' somefile

Print all but the first column:

awk '{$1=""; print $0}' somefile

Print all but the first two columns:

awk '{$1=$2=""; print $0}' somefile

Regex – Matching Up to the First Occurrence of a Character

You need

/^[^;]*/

The [^;] is a character class, it matches everything but a semicolon.

^ (start of line anchor) is added to the beginning of the regex so only the first match on each line is captured. This may or may not be required, depending on whether possible subsequent matches are desired.

To cite the perlre manpage:

You can specify a character class, by enclosing a list of characters in [] , which will match any character from the list. If the first character after the "[" is "^", the class matches any character not in the list.

This should work in most regex dialects.

Note: The pattern will match everything up to the first semicolon, but excluding the semicolon. Also, it will match the whole line if there is no semicolon. If you want the semicolon included in the match, add a semicolon at the end of the pattern.

Best Answer

Related Solutions

AWK – Print All Columns from nth to Last

Regex – Matching Up to the First Occurrence of a Character

Related Question