## Dynamic Programming – 4 : Sequence Alignment

February 3, 2011 Leave a comment

**Sequence Alignment **

We are given 2 sequences and . Our task is to produce a valid matching of the elements of and such that the cost of the matching is minimized.

Terminology:

A valid matching denotes a one-to-one correspondence of (some subset of) elements of and with the added condition that there should be *no crossing *in the matching.i.e. if is matched to , and is matched to , then implies .

e.g.

X = < e l e p h a n t >

Y = < m o n k e y >

The matching refers to the valid matching where we match and and . Note that there are no crossings in .

—

Cost of a matching is the sum of:

(1) matching cost for each .

(2) gap cost for each element of or which is not part of .

—

**Solution. **

Given and , we have 2 choices: either include in the optimum matching, or exclude it from the optimum matching.

**Case 1: **

If is part of optimum matching, then the optimum cost would be plus the optimum cost of matching the sequences and .

**Notation: **Let denote the optimum cost of matching and .

Therefore, we have:

—

**Case 2: **

If is not part of the optimum matching, then one of the following should hold true (because of the no crossing condition):

(1) is not part of the optimum matching.

or, (2) is not part of the optimum matching.

Therefore, we have:

—-

Generalizing the above recurrence, we have, for and :

Also, we have the base conditions that:

for all ; and analogously for .

—-

**Pseudocode for computing optimum value: **

**Initialize OPT [i, 0] and OPT [0,j] as given above. **

**For i from 1 to m: **

**For j from 1 to n:**

**Compute OPT [i,j] using the given recurrence. **

**End of algorithm. **

**—-**

The above algorithm computes the optimum value in time.

—-

**Computing the optimum solution using OPT [i,j] values: **

Let us denote :

(1) by Expr1.

(2) by Expr2.

(3) by Expr3.

**Function: OPT_Solution (i,j) **

**If (i = = 0) OR ( j = = 0) **

**Return; **

**If Expr1 <= both Expr2 and Expr3**

**OPT_Solution (i-1, j-1); **

**Print: Index i of X is matched to Index j of Y. **

**Else if: Expr2 <= both Expr1 and Expr3**

**OPT_Solution (i, j-1); **

**Else: **

**OPT_Solution (i-1, j); **

**End of function. **

**—-**

What we want to compute is OPT_Solution (m, n). The above function computes that in time .

—-