### Question

dynamic_programming

תת-מחרוזת משותפת ארוכה ביותר - יהיו $w_1=a_1a_1...a_n , w_2=b_1b_2...b_m$ מחרוזות מעל אותו אלפבית, תת מחרוזת היא תת סדרה עולה של אותיות מתוך מחרוזת. (עולה ביחס לאינדקסים), הציעו אלגוריתם שמחזיר את אורך תת המחרוזת המשותפת (תת המחרוזת המופיעה בשתי המחרוזות) הארוכה ביותר הארוכה ביותר וכן את תת המחרוזת עצמה.

1

Lets us define the next recursive expression: $S(i,j)$ the longest common sub-string of $a_1a_2...a_i , b_1b_2...b_j$. So our solution will be $S(n,m)$

• $S(0,j) = S(i,0) = 0$ - A definition for empty words.
• $S(i,j) = 1+S(i-1, j-1)$ in case $a_i = a_j$ or $max{S(i-1,j) , S(i,j-1)}$ otherwise.

algorithm:

1. Start an $n + 1\times m+1$ matrix full with $0$'s.
2. Fill the matrix diagonally (right to left diagonals starting at $S(1,1)$ then $S(1,2) , S(2,1)$, then $S(1,3) , S(2,2), S(3,2)$ and do forth ) using the $S(i,j)$ expressions (pay attention that $S(0,j) , S(i,0)$ are already filled).In addition: at each time when $S(i,j) = 1+(S(i-1),(j-1))$ save the pair $(i,j)$ (those pairs are the sub-string's indexes in $w_1 , w_2$ respectively).
3. Return $S(n,m)$, and the sub-string formed from the saved indexes.

Complexity $O(nm)$ time, $O(max\{m,n\})$ space.

Correctness:

• Let us prove by induction that for all $(i,j)$ , $S(i,j)$ values is correct.
• Base case: $i = 0$ so, for all $j\in [m]$ the longest common sub-string is the empty string from length $0$, and $S(0,j)$ is defined to be $0$ as needed.
• Assumption: for all $1\le i < k$ and *for all* $j\in [m]$ $S(i,j)$ holds the correct values.
• Step: (now we start an induction over j values when $i=k$ through the whole induction process).
• Base: $j=0$ as explained before the longest common sub-string is from length $0$, and $S(k,0) = 0$ (as defined).
• Assumption : for all $1\le j < l$ we assume $S(k,j)$ holds the right value.
• Step:
1. In case $a_k \neq b_l$: the longest common sub-string can't include both $a_k,b_k$ so the longest common sub-string is the maximal between the longest sub-string formed from $a_1...a_k, b_1...b_{l-1}$ or from $a_1...a_{k-1} , b_1...b_l$. In this case, the induction's assumption promises $S(k,l-1) , S(k-1,l)$ holds the correct values, so $S(k,l) = max\{S(k,l-1) , S(k-1,l)\}$ is the correct value.
2. In case $a_k = b_l$, let us assume there exists a common sub-string $S$ such that $|S| > S(k-1,l-1) +1$ -:
• in case $S$ is formed from the indexes $k,l$ from $w_1 ,w_2$ respectively, then $S = S' \cup (a_k,b_l)$ when $S'$ is formed from the indexes $[k-1],[l-1]$ from our induction assumption we know that $S(k-1,l-1) > |S'|$ therefore $|S'|\le S(k-1,l-1)+1$ which contradicts the assumption.
• in case $S'$ is formed from $k$ and not from $l$ (means a couple for $a_k$ is taken from $b_1b_2...b_{l-1}$ then $|S'| < S(k-1,l)$ by the induction assumption, Now we need to make the last observation: $S(k-1,l) \le S(k-1,l-1)+1$, this holds due to the fact that in $S(k-1,l)$ one may take at list one indexes pair $(a_i,b_j)$ which doesn't appear in $S(k-1,l-1)$ which is an indexes pair of the form $(i,l)$, thus by adding $1$ to $S(k-1,l-1)$ we may bound the difference of $S(k-1,l-1), S(k-1,l)$. So in conclusion $|S'| < S(k-1,l) \le S(k-1,l-1)+1 = S(i,j)$ so this contradicts $S'$ existence.
• The case of $S'$ is formed from $l$ but not from $k$ is symmetric.

This finishes the prove.

0

Lets us define the next recursive expression: $S(i,j)$ the longest common sub-string of $a_1a_2...a_i , b_1b_2...b_j$. So our solution will be $S(n,m)$

• $S(0,j) = S(i,0) = 0$ - A definition for empty words.
• $S(i,j) = 1+S(i-1, j-1)$ in case $a_i = a_j$ or $max{S(i-1,j) , S(i,j-1)}$ otherwise.

algorithm:

1. Start an $n + 1\times m+1$ matrix full with $0$'s.
2. Fill the matrix diagonally (right to left diagonals starting at $S(1,1)$ then $S(1,2) , S(2,1)$, then $S(1,3) , S(2,2), S(3,2)$ and do forth ) using the $S(i,j)$ expressions (pay attention that $S(0,j) , S(i,0)$ are already filled).In addition: at each time when $S(i,j) = 1+(S(i-1),(j-1))$ save the pair $(i,j)$ (those pairs are the sub-string's indexes in $w_1 , w_2$ respectively).
3. Return $S(n,m)$, and the sub-string formed from the saved indexes.

Complexity $O(nm)$ time, $O(max\{m,n\})$ space.

Correctness:

• Let us prove by induction that for all $(i,j)$ , $S(i,j)$ values is correct.
• Base case: $i = 0$ so, for all $j\in [m]$ the longest common sub-string is the empty string from length $0$, and $S(0,j)$ is defined to be $0$ as needed.
• Assumption: for all $1\le i < k$ and *for all* $j\in [m]$ $S(i,j)$ holds the correct values.
• Step: (now we start an induction over j values when $i=k$ through the whole induction process).
• Base: $j=0$ as explained before the longest common sub-string is from length $0$, and $S(k,0) = 0$ (as defined).
• Assumption : for all $1\le j < l$ we assume $S(k,j)$ holds the right value.
• Step:
1. In case $a_k \neq b_l$: the longest common sub-string can't include both $a_k,b_k$ so the longest common sub-string is the maximal between the longest sub-string formed from $a_1...a_k, b_1...b_{l-1}$ or from $a_1...a_{k-1} , b_1...b_l$. In this case, the induction's assumption promises $S(k,l-1) , S(k-1,l)$ holds the correct values, so $S(k,l) = max\{S(k,l-1) , S(k-1,l)\}$ is the correct value.
2. In case $a_k = b_l$, let us assume there exists a common sub-string $S$ such that $|S| > S(k-1,l-1) +1$ -:
• in case $S$ is formed from the indexes $k,l$ from $w_1 ,w_2$ respectively, then $S = S' \cup (a_k,b_l)$ when $S'$ is formed from the indexes $[k-1],[l-1]$ from our induction assumption we know that $S(k-1,l-1) > |S'|$ therefore $|S'|\le S(k-1,l-1)+1$ which contradicts the assumption.
• in case $S'$ is formed from $k$ and not from $l$ (means a couple for $a_k$ is taken from $b_1b_2...b_{l-1}$ then $|S'| < S(k-1,l)$ by the induction assumption, Now we need to make the last observation: $S(k-1,l) \le S(k-1,l-1)+1$, this holds due to the fact that in $S(k-1,l)$ one may take at list one indexes pair $(a_i,b_j)$ which doesn't appear in $S(k-1,l-1)$ which is an indexes pair of the form $(i,l)$, thus by adding $1$ to $S(k-1,l-1)$ we may bound the difference of $S(k-1,l-1), S(k-1,l)$. So in conclusion $|S'| < S(k-1,l) \le S(k-1,l-1)+1 = S(i,j)$ so this contradicts $S'$ existence.
• The case of $S'$ is formed from $l$ but not from $k$ is symmetric.

This finishes the prove.

0
Feedback