Bash Shell Scripting – How to Delete All the Lines That Match and One After Each of Them?

awkbashgrepsedshell

I have a large file and a list of my specific strings. The output should not contain my specific lines and one more after each of them. 2 consecutive matches are impossible due to structure of file that i want to filter. For example,

Specific lines:

'ggg'
'sss'

Input:

'ggg'
'123'
'rrr'
'321'
'sss'
'666'

Output:

'rrr'
'321'

Simple grep -v -A 1 does not work

Best Answer

Assumptions:

  • we are looking for exact line matches, to include white space, punctuation marks and quotes
  • matches can occur on consecutive lines in which case we ignore all matches plus the next non-matching line (NOTE: OP has added a comment stating consecutive line matches are not possible; see end of answer for a simplified awk script)

General approach:

  • if we find a matching line then we ignore the current line and set a flag to ignore the next line
  • if the flag is set we ignore the current line and clear the flag
  • otherwise we print the current line

Sample input file:

$ cat input
'ggg'                       # match/ignore and 
'123'                       # ignore
'rrr'
'321'
'sss'                       # match/ignore and 
'666'                       # ignore
'aaa' 'ggg' 'xxx'
'12345'
'xxx'                       # match/ignore and
'xxx'                       # match/ignore and
98352                       # ignore
'xyz'
hello world

Sample set of lines to match on (and ignore):

$ cat lines
'ggg'              # will not match on the line: 'aaa' 'ggg' 'xxx'
'sss'
rrr                # will not match on 'rrr' because of the missing quotes
'xxx'              # will match on consecutive lines and skip the next non-matching line

NOTE: comments do not exist in files

One awk idea:

awk '
#### 1st file:

FNR==NR { a[$0];  next }       # save line as index in array a[]

#### 2nd file:

$0 in a { skip=1; next }       # if line is an index in array then set the "skip" flag and ignore this line

skip    { skip=0; next }       # if flag is set then clear flag and ignore this line

1                              # otherwise print current line
' lines input

######
# or as a one-liner

awk 'FNR==NR {a[$0];next} $0 in a {skip=1;next} skip {skip=0;next} 1' lines input

This generates:

'rrr'
'321'
'aaa' 'ggg' 'xxx'
'12345'
'xyz'
hello world

NOTE: if assumptions are wrong and/or this does not work for OP's actual files then we'll need the question updated with a more representative set of data


OP has added a comment stating consecutive line matches cannot occur. This allows us to simplify the code a bit:

awk '
FNR==NR { a[$0];   next }       # 1st file: save line as index in array a[]
$0 in a { getline; next }       # 2nd file: if line is an index in array then get next line (and ignore) then skip to next input line otherwise ...
1                               # print current line
' lines input

######
# or as a one-liner

awk 'FNR==NR {a[$0];next} $0 in a {getline;next} 1' lines input

If we remove one of the 'xxx' lines from the input file this will generate:

'rrr'
'321'
'aaa' 'ggg' 'xxx'
'12345'
'xyz'
hello world