📜  最长公共子序列| DP-4

📅  最后修改于: 2021-05-04 11:25:45             🧑  作者: Mango

我们已经分别讨论了集合1和集合2中的子问题重叠和最优子结构属性。我们还讨论了集合3中的一个示例问题。让我们讨论最长公共子序列(LCS)问题,作为可以使用动态编程解决的另一个示例问题。

LCS问题陈述:给定两个序列,找出两个序列中存在的最长子序列的长度。子序列是以相同的相对顺序出现,但不一定是连续的序列。例如,“ abc”,“ abg”,“ bdf”,“ aeg”,“ acefg”等是“ abcdefg”的子序列。

为了找出暴力破解方法的复杂性,我们首先需要知道长度为n的字符串的可能不同子序列的数量,即找到长度为1,2,.. n-1的子序列的数量。 。从置换和组合理论中回想起,具有1个元素的组合的数量为n C 1 。具有2个元素的组合的数量为n C 2 ,依此类推。我们知道n C 0 + n C 1 + n C 2 +… n C n = 2 n 。由于我们不考虑长度为0的子序列,因此长度为n的字符串具有2 n -1个不同的可能子序列。这意味着蛮力方法的时间复杂度将为O(n * 2 n )。注意,检查两个字符串是否共用一个子序列需要O(n)时间。使用动态编程可以改善这种时间复杂度。

这是一个经典的计算机科学问题,是diff(文件比较程序,输出两个文件之间的差异)的基础,并已在生物信息学中得到应用。

例子:
输入序列“ ABCDGH”和“ AEDFHR”的LCS为长度3的“ ADH”。
输入序列“ AGGTAB”和“ GXTXAYB”的LCS是长度为4的“ GTAB”。

这个问题的幼稚解决方案是生成两个给定序列的所有子序列,并找到最长的匹配子序列。该解决方案在时间复杂度方面是指数级的。让我们看看这个问题如何同时具有动态编程(DP)问题的两个重要特性。

1)最佳子结构:
令输入序列分别为长度[m]和[n]的X [0..m-1]和Y [0..n-1]。并令L(X [0..m-1],Y [0..n-1])为两个序列X和Y的LCS的长度。以下是L(X [0 … m-1],Y [0..n-1])。

如果两个序列的最后一个字符匹配(或X [m-1] == Y [n-1]),则
L(X [0..m-1],Y [0..n-1])= 1 + L(X [0..m-2],Y [0..n-2])

如果两个序列的最后一个字符都不匹配(或X [m-1]!= Y [n-1]),则
L(X [0..m-1],Y [0..n-1])= MAX(L(X [0..m-2],Y [0..n-1]),L( X [0..m-1],Y [0..n-2]))

例子:
1)考虑输入字符串“ AGGTAB”和“ GXTXAYB”。最后的字符匹配字符串。因此,LCS的长度可以写成:
L(“ AGGTAB”,“ GXTXAYB”)= 1 + L(“ AGGTA”,“ GXTXAY”)
最长公共子序列
2)考虑输入字符串“ ABCDGH”和“ AEDFHR”。字符串的末尾字符不匹配。因此,LCS的长度可以写成:
L(“ ABCDGH”,“ AEDFHR”)= MAX(L(“ ABCDG”,“ AEDFH R ”),L(“ ABCDG H ”,“ AEDFH”))

因此,LCS问题具有最佳的子结构属性,因为可以使用子问题的解决方案来解决主要问题。

2)重叠子问题:
以下是LCS问题的简单递归实现。该实现仅遵循上述递归结构。

C++
/* A Naive recursive implementation of LCS problem */
#include 
using namespace std;
  
int max(int a, int b); 
  
/* Returns length of LCS for X[0..m-1], Y[0..n-1] */
int lcs( char *X, char *Y, int m, int n ) 
{ 
    if (m == 0 || n == 0) 
        return 0; 
    if (X[m-1] == Y[n-1]) 
        return 1 + lcs(X, Y, m-1, n-1); 
    else
        return max(lcs(X, Y, m, n-1), lcs(X, Y, m-1, n)); 
} 
  
/* Utility function to get max of 2 integers */
int max(int a, int b) 
{ 
    return (a > b)? a : b; 
} 
  
/* Driver code */
int main() 
{ 
    char X[] = "AGGTAB"; 
    char Y[] = "GXTXAYB"; 
      
    int m = strlen(X); 
    int n = strlen(Y); 
      
    cout<<"Length of LCS is "<< lcs( X, Y, m, n ) ; 
      
    return 0; 
} 
  
// This code is contributed by rathbhupendra


C
/* A Naive recursive implementation of LCS problem */
#include
  
int max(int a, int b);
  
/* Returns length of LCS for X[0..m-1], Y[0..n-1] */
int lcs( char *X, char *Y, int m, int n )
{
   if (m == 0 || n == 0)
     return 0;
   if (X[m-1] == Y[n-1])
     return 1 + lcs(X, Y, m-1, n-1);
   else
     return max(lcs(X, Y, m, n-1), lcs(X, Y, m-1, n));
}
  
/* Utility function to get max of 2 integers */
int max(int a, int b)
{
    return (a > b)? a : b;
}
  
/* Driver program to test above function */
int main()
{
  char X[] = "AGGTAB";
  char Y[] = "GXTXAYB";
  
  int m = strlen(X);
  int n = strlen(Y);
  
  printf("Length of LCS is %d", lcs( X, Y, m, n ) );
  
  return 0;
}


Java
/* A Naive recursive implementation of LCS problem in java*/
public class LongestCommonSubsequence
{
  
  /* Returns length of LCS for X[0..m-1], Y[0..n-1] */
  int lcs( char[] X, char[] Y, int m, int n )
  {
    if (m == 0 || n == 0)
      return 0;
    if (X[m-1] == Y[n-1])
      return 1 + lcs(X, Y, m-1, n-1);
    else
      return max(lcs(X, Y, m, n-1), lcs(X, Y, m-1, n));
  }
  
  /* Utility function to get max of 2 integers */
  int max(int a, int b)
  {
    return (a > b)? a : b;
  }
  
  public static void main(String[] args)
  {
    LongestCommonSubsequence lcs = new LongestCommonSubsequence();
    String s1 = "AGGTAB";
    String s2 = "GXTXAYB";
  
    char[] X=s1.toCharArray();
    char[] Y=s2.toCharArray();
    int m = X.length;
    int n = Y.length;
  
    System.out.println("Length of LCS is" + " " +
                                  lcs.lcs( X, Y, m, n ) );
  }
  
}
  
// This Code is Contributed by Saket Kumar


Python
# A Naive recursive Python implementation of LCS problem
  
def lcs(X, Y, m, n):
  
    if m == 0 or n == 0:
       return 0;
    elif X[m-1] == Y[n-1]:
       return 1 + lcs(X, Y, m-1, n-1);
    else:
       return max(lcs(X, Y, m, n-1), lcs(X, Y, m-1, n));
  
  
# Driver program to test the above function
X = "AGGTAB"
Y = "GXTXAYB"
print "Length of LCS is ", lcs(X , Y, len(X), len(Y))


C#
/* C#  Naive recursive implementation of LCS problem */
using System;
  
class GFG
{
      
  
    /* Returns length of LCS for X[0..m-1], Y[0..n-1] */
    static int lcs( char[] X, char[] Y, int m, int n )
    {
        if (m == 0 || n == 0)
        return 0;
        if (X[m - 1] == Y[n - 1])
        return 1 + lcs(X, Y, m - 1, n - 1);
        else
        return max(lcs(X, Y, m, n - 1), lcs(X, Y, m - 1, n));
    }
      
    /* Utility function to get max of 2 integers */
    static int max(int a, int b)
    {
        return (a > b)? a : b;
    }
      
    public static void Main()
    {
        String s1 = "AGGTAB";
        String s2 = "GXTXAYB";
      
        char[] X=s1.ToCharArray();
        char[] Y=s2.ToCharArray();
        int m = X.Length;
        int n = Y.Length;
      
        Console.Write("Length of LCS is" + " " 
                      +lcs( X, Y, m, n ) );
    }
}
// This code is Contributed by Sam007


PHP


C++
/* Dynamic Programming C++ implementation of LCS problem */
#include 
using namespace std;
  
int max(int a, int b); 
  
/* Returns length of LCS for X[0..m-1], Y[0..n-1] */
int lcs( char *X, char *Y, int m, int n ) 
{ 
    int L[m + 1][n + 1]; 
    int i, j; 
      
    /* Following steps build L[m+1][n+1] in 
       bottom up fashion. Note that L[i][j] 
       contains length of LCS of X[0..i-1]
       and Y[0..j-1] */
    for (i = 0; i <= m; i++) 
    { 
        for (j = 0; j <= n; j++) 
        { 
        if (i == 0 || j == 0) 
            L[i][j] = 0; 
      
        else if (X[i - 1] == Y[j - 1]) 
            L[i][j] = L[i - 1][j - 1] + 1; 
      
        else
            L[i][j] = max(L[i - 1][j], L[i][j - 1]); 
        } 
    } 
          
    /* L[m][n] contains length of LCS 
    for X[0..n-1] and Y[0..m-1] */
    return L[m][n]; 
} 
  
/* Utility function to get max of 2 integers */
int max(int a, int b) 
{ 
    return (a > b)? a : b; 
} 
  
// Driver Code
int main() 
{ 
    char X[] = "AGGTAB"; 
    char Y[] = "GXTXAYB"; 
      
    int m = strlen(X); 
    int n = strlen(Y); 
      
    cout << "Length of LCS is " 
         << lcs( X, Y, m, n ); 
      
    return 0; 
} 
  
// This code is contributed by rathbhupendra


C
/* Dynamic Programming C implementation of LCS problem */
#include
   
int max(int a, int b);
   
/* Returns length of LCS for X[0..m-1], Y[0..n-1] */
int lcs( char *X, char *Y, int m, int n )
{
   int L[m+1][n+1];
   int i, j;
   
   /* Following steps build L[m+1][n+1] in bottom up fashion. Note 
      that L[i][j] contains length of LCS of X[0..i-1] and Y[0..j-1] */
   for (i=0; i<=m; i++)
   {
     for (j=0; j<=n; j++)
     {
       if (i == 0 || j == 0)
         L[i][j] = 0;
   
       else if (X[i-1] == Y[j-1])
         L[i][j] = L[i-1][j-1] + 1;
   
       else
         L[i][j] = max(L[i-1][j], L[i][j-1]);
     }
   }
     
   /* L[m][n] contains length of LCS for X[0..n-1] and Y[0..m-1] */
   return L[m][n];
}
   
/* Utility function to get max of 2 integers */
int max(int a, int b)
{
    return (a > b)? a : b;
}
   
/* Driver program to test above function */
int main()
{
  char X[] = "AGGTAB";
  char Y[] = "GXTXAYB";
   
  int m = strlen(X);
  int n = strlen(Y);
   
  printf("Length of LCS is %d", lcs( X, Y, m, n ) );
  
  return 0;
}


Java
/* Dynamic Programming Java implementation of LCS problem */
public class LongestCommonSubsequence
{
  
  /* Returns length of LCS for X[0..m-1], Y[0..n-1] */
  int lcs( char[] X, char[] Y, int m, int n )
  {
    int L[][] = new int[m+1][n+1];
  
    /* Following steps build L[m+1][n+1] in bottom up fashion. Note
         that L[i][j] contains length of LCS of X[0..i-1] and Y[0..j-1] */
    for (int i=0; i<=m; i++)
    {
      for (int j=0; j<=n; j++)
      {
        if (i == 0 || j == 0)
            L[i][j] = 0;
        else if (X[i-1] == Y[j-1])
            L[i][j] = L[i-1][j-1] + 1;
        else
            L[i][j] = max(L[i-1][j], L[i][j-1]);
      }
    }
  return L[m][n];
  }
  
  /* Utility function to get max of 2 integers */
  int max(int a, int b)
  {
    return (a > b)? a : b;
  }
  
  public static void main(String[] args)
  {
    LongestCommonSubsequence lcs = new LongestCommonSubsequence();
    String s1 = "AGGTAB";
    String s2 = "GXTXAYB";
  
    char[] X=s1.toCharArray();
    char[] Y=s2.toCharArray();
    int m = X.length;
    int n = Y.length;
  
    System.out.println("Length of LCS is" + " " +
                                  lcs.lcs( X, Y, m, n ) );
  }
  
}
  
// This Code is Contributed by Saket Kumar


Python
# Dynamic Programming implementation of LCS problem
  
def lcs(X , Y):
    # find the length of the strings
    m = len(X)
    n = len(Y)
  
    # declaring the array for storing the dp values
    L = [[None]*(n+1) for i in xrange(m+1)]
  
    """Following steps build L[m+1][n+1] in bottom up fashion
    Note: L[i][j] contains length of LCS of X[0..i-1]
    and Y[0..j-1]"""
    for i in range(m+1):
        for j in range(n+1):
            if i == 0 or j == 0 :
                L[i][j] = 0
            elif X[i-1] == Y[j-1]:
                L[i][j] = L[i-1][j-1]+1
            else:
                L[i][j] = max(L[i-1][j] , L[i][j-1])
  
    # L[m][n] contains the length of LCS of X[0..n-1] & Y[0..m-1]
    return L[m][n]
#end of function lcs
  
  
# Driver program to test the above function
X = "AGGTAB"
Y = "GXTXAYB"
print "Length of LCS is ", lcs(X, Y)
  
# This code is contributed by Nikhil Kumar Singh(nickzuck_007)


C#
// Dynamic Programming C# implementation 
// of LCS problem 
using System;
  
class GFG
{
  
    /* Returns length of LCS for X[0..m-1], Y[0..n-1] */
    static int lcs( char[] X, char[] Y, int m, int n )
    {
        int [,]L = new int[m+1,n+1];
      
        /* Following steps build L[m+1][n+1] 
        in bottom up fashion. Note
        that L[i][j] contains length of 
        LCS of X[0..i-1] and Y[0..j-1] */
        for (int i = 0; i <= m; i++)
        {
            for (int j = 0; j <= n; j++)
            {
                if (i == 0 || j == 0)
                    L[i, j] = 0;
                else if (X[i - 1] == Y[j - 1])
                    L[i, j] = L[i - 1, j - 1] + 1;
                else
                    L[i, j] = max(L[i - 1, j], L[i, j - 1]);
            }
        }
        return L[m, n];
    }
      
    /* Utility function to get max of 2 integers */
    static int max(int a, int b)
    {
        return (a > b)? a : b;
    }
      
    // Driver code
    public static void Main()
    {
          
        String s1 = "AGGTAB";
        String s2 = "GXTXAYB";
      
        char[] X=s1.ToCharArray();
        char[] Y=s2.ToCharArray();
        int m = X.Length;
        int n = Y.Length;
      
        Console.Write("Length of LCS is" + " " +lcs( X, Y, m, n ) );
    }
}
// This Code is Contributed by Sam007


PHP


输出:

Length of LCS is 4

上述幼稚递归方法的时间复杂度在最坏的情况下为O(2 ^ n),最坏的情况发生在X和Y的所有字符不匹配(即LCS的长度为0)时。
考虑到以上实现,下面是输入字符串“ AXYT”和“ AYZX”的部分递归树

lcs("AXYT", "AYZX")
                       /                 
         lcs("AXY", "AYZX")            lcs("AXYT", "AYZ")
         /                              /               
lcs("AX", "AYZX") lcs("AXY", "AYZ")   lcs("AXY", "AYZ") lcs("AXYT", "AY")

在上面的部分递归树中,lcs(“ AXY”,“ AYZ”)被求解两次。如果绘制完整的递归树,则可以看到有很多子问题可以一次又一次地解决。因此,此问题具有“重叠子结构”属性,可以通过使用“记忆化”或“制表”来避免相同子问题的重新计算。以下是LCS问题的列表实现。

C++

/* Dynamic Programming C++ implementation of LCS problem */
#include 
using namespace std;
  
int max(int a, int b); 
  
/* Returns length of LCS for X[0..m-1], Y[0..n-1] */
int lcs( char *X, char *Y, int m, int n ) 
{ 
    int L[m + 1][n + 1]; 
    int i, j; 
      
    /* Following steps build L[m+1][n+1] in 
       bottom up fashion. Note that L[i][j] 
       contains length of LCS of X[0..i-1]
       and Y[0..j-1] */
    for (i = 0; i <= m; i++) 
    { 
        for (j = 0; j <= n; j++) 
        { 
        if (i == 0 || j == 0) 
            L[i][j] = 0; 
      
        else if (X[i - 1] == Y[j - 1]) 
            L[i][j] = L[i - 1][j - 1] + 1; 
      
        else
            L[i][j] = max(L[i - 1][j], L[i][j - 1]); 
        } 
    } 
          
    /* L[m][n] contains length of LCS 
    for X[0..n-1] and Y[0..m-1] */
    return L[m][n]; 
} 
  
/* Utility function to get max of 2 integers */
int max(int a, int b) 
{ 
    return (a > b)? a : b; 
} 
  
// Driver Code
int main() 
{ 
    char X[] = "AGGTAB"; 
    char Y[] = "GXTXAYB"; 
      
    int m = strlen(X); 
    int n = strlen(Y); 
      
    cout << "Length of LCS is " 
         << lcs( X, Y, m, n ); 
      
    return 0; 
} 
  
// This code is contributed by rathbhupendra

C

/* Dynamic Programming C implementation of LCS problem */
#include
   
int max(int a, int b);
   
/* Returns length of LCS for X[0..m-1], Y[0..n-1] */
int lcs( char *X, char *Y, int m, int n )
{
   int L[m+1][n+1];
   int i, j;
   
   /* Following steps build L[m+1][n+1] in bottom up fashion. Note 
      that L[i][j] contains length of LCS of X[0..i-1] and Y[0..j-1] */
   for (i=0; i<=m; i++)
   {
     for (j=0; j<=n; j++)
     {
       if (i == 0 || j == 0)
         L[i][j] = 0;
   
       else if (X[i-1] == Y[j-1])
         L[i][j] = L[i-1][j-1] + 1;
   
       else
         L[i][j] = max(L[i-1][j], L[i][j-1]);
     }
   }
     
   /* L[m][n] contains length of LCS for X[0..n-1] and Y[0..m-1] */
   return L[m][n];
}
   
/* Utility function to get max of 2 integers */
int max(int a, int b)
{
    return (a > b)? a : b;
}
   
/* Driver program to test above function */
int main()
{
  char X[] = "AGGTAB";
  char Y[] = "GXTXAYB";
   
  int m = strlen(X);
  int n = strlen(Y);
   
  printf("Length of LCS is %d", lcs( X, Y, m, n ) );
  
  return 0;
}

Java

/* Dynamic Programming Java implementation of LCS problem */
public class LongestCommonSubsequence
{
  
  /* Returns length of LCS for X[0..m-1], Y[0..n-1] */
  int lcs( char[] X, char[] Y, int m, int n )
  {
    int L[][] = new int[m+1][n+1];
  
    /* Following steps build L[m+1][n+1] in bottom up fashion. Note
         that L[i][j] contains length of LCS of X[0..i-1] and Y[0..j-1] */
    for (int i=0; i<=m; i++)
    {
      for (int j=0; j<=n; j++)
      {
        if (i == 0 || j == 0)
            L[i][j] = 0;
        else if (X[i-1] == Y[j-1])
            L[i][j] = L[i-1][j-1] + 1;
        else
            L[i][j] = max(L[i-1][j], L[i][j-1]);
      }
    }
  return L[m][n];
  }
  
  /* Utility function to get max of 2 integers */
  int max(int a, int b)
  {
    return (a > b)? a : b;
  }
  
  public static void main(String[] args)
  {
    LongestCommonSubsequence lcs = new LongestCommonSubsequence();
    String s1 = "AGGTAB";
    String s2 = "GXTXAYB";
  
    char[] X=s1.toCharArray();
    char[] Y=s2.toCharArray();
    int m = X.length;
    int n = Y.length;
  
    System.out.println("Length of LCS is" + " " +
                                  lcs.lcs( X, Y, m, n ) );
  }
  
}
  
// This Code is Contributed by Saket Kumar

Python

# Dynamic Programming implementation of LCS problem
  
def lcs(X , Y):
    # find the length of the strings
    m = len(X)
    n = len(Y)
  
    # declaring the array for storing the dp values
    L = [[None]*(n+1) for i in xrange(m+1)]
  
    """Following steps build L[m+1][n+1] in bottom up fashion
    Note: L[i][j] contains length of LCS of X[0..i-1]
    and Y[0..j-1]"""
    for i in range(m+1):
        for j in range(n+1):
            if i == 0 or j == 0 :
                L[i][j] = 0
            elif X[i-1] == Y[j-1]:
                L[i][j] = L[i-1][j-1]+1
            else:
                L[i][j] = max(L[i-1][j] , L[i][j-1])
  
    # L[m][n] contains the length of LCS of X[0..n-1] & Y[0..m-1]
    return L[m][n]
#end of function lcs
  
  
# Driver program to test the above function
X = "AGGTAB"
Y = "GXTXAYB"
print "Length of LCS is ", lcs(X, Y)
  
# This code is contributed by Nikhil Kumar Singh(nickzuck_007)

C#

// Dynamic Programming C# implementation 
// of LCS problem 
using System;
  
class GFG
{
  
    /* Returns length of LCS for X[0..m-1], Y[0..n-1] */
    static int lcs( char[] X, char[] Y, int m, int n )
    {
        int [,]L = new int[m+1,n+1];
      
        /* Following steps build L[m+1][n+1] 
        in bottom up fashion. Note
        that L[i][j] contains length of 
        LCS of X[0..i-1] and Y[0..j-1] */
        for (int i = 0; i <= m; i++)
        {
            for (int j = 0; j <= n; j++)
            {
                if (i == 0 || j == 0)
                    L[i, j] = 0;
                else if (X[i - 1] == Y[j - 1])
                    L[i, j] = L[i - 1, j - 1] + 1;
                else
                    L[i, j] = max(L[i - 1, j], L[i, j - 1]);
            }
        }
        return L[m, n];
    }
      
    /* Utility function to get max of 2 integers */
    static int max(int a, int b)
    {
        return (a > b)? a : b;
    }
      
    // Driver code
    public static void Main()
    {
          
        String s1 = "AGGTAB";
        String s2 = "GXTXAYB";
      
        char[] X=s1.ToCharArray();
        char[] Y=s2.ToCharArray();
        int m = X.Length;
        int n = Y.Length;
      
        Console.Write("Length of LCS is" + " " +lcs( X, Y, m, n ) );
    }
}
// This Code is Contributed by Sam007 

的PHP


输出:

Length of LCS is 4

上述实现的时间复杂度为O(mn),这比Naive Recursive实现的最坏情况下的时间复杂度要好得多。

上面的算法/代码仅返回LCS的长度。请参阅以下文章以打印LCS。
打印最长的公共子序列

您还可以在以下位置检查LCS的空间优化版本:
LCS的空间优化解决方案

基于LCS的最新文章!

参考:
http://www.youtube.com/watch?v=V5hZoJ6uK-s
http://www.algorithmist.com/index。 PHP/ Longest_Common_Subsequence
http://www.ics.uci.edu/~eppstein/161/960229.html
http://en.wikipedia.org/wiki/Longest_common_subsequence_problem