## Contextual BLAST Project. |

We present a software tool called CTX-BLAST that incorporates contextual alignment model into the popular protein BLAST program. Our efficient alignment tool allows us to investigate the effect of context-dependency in the protein alignment in genomic scale. The software makes use of non-symmetric contextual substitution tables and calculates the statistical significance of a given alignment according to the contextual statistical model.

The following command runs the standard version of BLAST:

blast -p program -d database -i query -o output_file

where program is in our case blastp (i.e. protein version of BLAST tool),

database is the name of database to scan for significant alignments,

query is the query sequence in FASTA format.

The contextual version of BLAST requires two additional parameters:

C defines the contextual substitution table, e.g. the CTX-BLOSUM62, the contextual counterpart of BLOSUM62

x defines the file containing the parameters for statistical significance calculation
in the following order: alpha, beta, lambda and K.

The exemplary usage of CTX-BLAST on the COG database "kyva" is the following:

blast -p blastp -d kyva -i query.txt -o outputctxb.htm -C CTX-BLOSUM62 -x stat.txt

To estimate the parameters for statistical significance we have to run program island, which implements the island method in the contextual model (the program is freely available together with CTX-BLAST).

Here is the example invocation:

island --phi=phi.txt --matrix=CTX-BLOSUM62 --align=LOCAL --wordsize=7000 --islecutoff=20 \

--iterisle=250 --gapopen=11 --gapcont=1 --output=bl62_1 --border=1000

--iterisle=250 --gapopen=11 --gapcont=1 --output=bl62_1 --border=1000

The parameter wordsize defines the length of compared sequences and border is the frame width, (see [1]) for details. The gap penalty is defined by parameters gapopen and gapcont. The number of comparisons to perform is iterisle and islecutoff is the treshold value for the islands. The output parameter specifies the output file. From statistics stored in this file we estimate parameters lambda, K, alpha and beta using the R-script isle.R (avaible in downloads).

[1] Altschul, S.F., Bundschuh, R., Olsen, R., & Hwa, T., The estimation of statistical parameters for local alignment score distributions, Nucleic Acids Research(2001), Vol. 29, No. 2 351-361.

All programs you can download here.

dinero47 at gmail.com

pw201111 at students.mimuw.edu.pl