Bash Shell Scripting – How to Delete All the Lines That Match and One After Each of Them?

awkbashgrepsedshell

I have a large file and a list of my specific strings. The output should not contain my specific lines and one more after each of them. 2 consecutive matches are impossible due to structure of file that i want to filter. For example,

Specific lines:

'ggg'
'sss'

Input:

'ggg'
'123'
'rrr'
'321'
'sss'
'666'

Output:

'rrr'
'321'

Simple grep -v -A 1 does not work

Best Answer

Assumptions:

we are looking for exact line matches, to include white space, punctuation marks and quotes
matches can occur on consecutive lines in which case we ignore all matches plus the next non-matching line (NOTE: OP has added a comment stating consecutive line matches are not possible; see end of answer for a simplified awk script)

General approach:

if we find a matching line then we ignore the current line and set a flag to ignore the next line
if the flag is set we ignore the current line and clear the flag
otherwise we print the current line

Sample input file:

$ cat input
'ggg'                       # match/ignore and 
'123'                       # ignore
'rrr'
'321'
'sss'                       # match/ignore and 
'666'                       # ignore
'aaa' 'ggg' 'xxx'
'12345'
'xxx'                       # match/ignore and
'xxx'                       # match/ignore and
98352                       # ignore
'xyz'
hello world

Sample set of lines to match on (and ignore):

$ cat lines
'ggg'              # will not match on the line: 'aaa' 'ggg' 'xxx'
'sss'
rrr                # will not match on 'rrr' because of the missing quotes
'xxx'              # will match on consecutive lines and skip the next non-matching line

NOTE: comments do not exist in files

One awk idea:

awk '
#### 1st file:

FNR==NR { a[$0];  next }       # save line as index in array a[]

#### 2nd file:

$0 in a { skip=1; next }       # if line is an index in array then set the "skip" flag and ignore this line

skip    { skip=0; next }       # if flag is set then clear flag and ignore this line

1                              # otherwise print current line
' lines input

######
# or as a one-liner

awk 'FNR==NR {a[$0];next} $0 in a {skip=1;next} skip {skip=0;next} 1' lines input

This generates:

'rrr'
'321'
'aaa' 'ggg' 'xxx'
'12345'
'xyz'
hello world

NOTE: if assumptions are wrong and/or this does not work for OP's actual files then we'll need the question updated with a more representative set of data

OP has added a comment stating consecutive line matches cannot occur. This allows us to simplify the code a bit:

awk '
FNR==NR { a[$0];   next }       # 1st file: save line as index in array a[]
$0 in a { getline; next }       # 2nd file: if line is an index in array then get next line (and ignore) then skip to next input line otherwise ...
1                               # print current line
' lines input

######
# or as a one-liner

awk 'FNR==NR {a[$0];next} $0 in a {getline;next} 1' lines input

If we remove one of the 'xxx' lines from the input file this will generate:

'rrr'
'321'
'aaa' 'ggg' 'xxx'
'12345'
'xyz'
hello world

Related Solutions

Shell – How to Delete Lines Containing a Specific String from a Text File

To remove the line and print the output to standard out:

sed '/pattern to match/d' ./infile

To directly modify the file – does not work with BSD sed:

sed -i '/pattern to match/d' ./infile

Same, but for BSD sed (Mac OS X and FreeBSD) – does not work with GNU sed:

sed -i '' '/pattern to match/d' ./infile

To directly modify the file (and create a backup) – works with BSD and GNU sed:

sed -i.bak '/pattern to match/d' ./infile

Count All Lines of Code in a Directory Recursively – Bash Script

Try:

find . -name '*.php' | xargs wc -l

or (when file names include special characters such as spaces)

find . -name '*.php' | sed 's/.*/"&"/' | xargs  wc -l

The SLOCCount tool may help as well.

It will give an accurate source lines of code count for whatever hierarchy you point it at, as well as some additional stats.

Sorted output:

find . -name '*.php' | xargs wc -l | sort -nr

Best Answer

Related Solutions

Shell – How to Delete Lines Containing a Specific String from a Text File

Count All Lines of Code in a Directory Recursively – Bash Script

Related Question