Parsing Sequences (Shell Scripting)

Posted by: admin  :  Category: Technical

Simple SHELL script for parsing BLAST output

1. To parse the sequence names from BLAST output.

“grep” is one of the very powerful unix command to retrieve the particular pattern from a file.

Syntax:
grep “” input_file

Example: grep “>” Blast_output.txt

In this above example grep command will retrieve the lines which are having “>” symbol. In Blast output file all the sequence names are starting with “>”. So you can get all the sequence names in the Blast output file.

Learn More 

2. Parsing the Sequence names and the sequences from the BLAST output 

“egrep” is one of the powerful command in retrieving multiple patterns from a file.

Syntax: 
egrep “pattern1 | pattern2 | pattern3″ filename

Example:
Below is the combination of SHELL and Perl script for parsing the BLAST Output.

egrep “> | sbjct” Blast_output | sed ’s/Sbjct://’ BLAST_output.txt >output.txt 
open (FH, output.txt);
while(”"= $ln)
{
if($ln !~ m/>/)
{
@temp = split(/\t/,$ln);
print “$temp[1]\n”;
}
else
{
print $ln;
}

In the above example egrep will retrieve the lines which are matching with “>” and Sbjct and store the output in output.txt. Then the Perl script will parse the sequeunces.

Leave a Reply