Dynamic Programming – 4 : Sequence Alignment
February 3, 2011 Leave a comment
We are given 2 sequences and . Our task is to produce a valid matching of the elements of and such that the cost of the matching is minimized.
A valid matching denotes a one-to-one correspondence of (some subset of) elements of and with the added condition that there should be no crossing in the matching.i.e. if is matched to , and is matched to , then implies .
X = < e l e p h a n t >
Y = < m o n k e y >
The matching refers to the valid matching where we match and and . Note that there are no crossings in .
Cost of a matching is the sum of:
(1) matching cost for each .
(2) gap cost for each element of or which is not part of .
Given and , we have 2 choices: either include in the optimum matching, or exclude it from the optimum matching.
If is part of optimum matching, then the optimum cost would be plus the optimum cost of matching the sequences and .
Notation: Let denote the optimum cost of matching and .
Therefore, we have:
If is not part of the optimum matching, then one of the following should hold true (because of the no crossing condition):
(1) is not part of the optimum matching.
or, (2) is not part of the optimum matching.
Therefore, we have:
Generalizing the above recurrence, we have, for and :
Also, we have the base conditions that:
for all ; and analogously for .
Pseudocode for computing optimum value:
Initialize OPT [i, 0] and OPT [0,j] as given above.
For i from 1 to m:
For j from 1 to n:
Compute OPT [i,j] using the given recurrence.
End of algorithm.
The above algorithm computes the optimum value in time.
Computing the optimum solution using OPT [i,j] values:
Let us denote :
(1) by Expr1.
(2) by Expr2.
(3) by Expr3.
Function: OPT_Solution (i,j)
If (i = = 0) OR ( j = = 0)
If Expr1 <= both Expr2 and Expr3
OPT_Solution (i-1, j-1);
Print: Index i of X is matched to Index j of Y.
Else if: Expr2 <= both Expr1 and Expr3
OPT_Solution (i, j-1);
OPT_Solution (i-1, j);
End of function.
What we want to compute is OPT_Solution (m, n). The above function computes that in time .