📜  Cocke–Younger–Kasami(CYK)算法(1)

📅  最后修改于: 2023-12-03 14:59:57.695000             🧑  作者: Mango

Cocke–Younger–Kasami (CYK) Algorithm

The Cocke-Younger-Kasami (CYK) algorithm is a parsing algorithm that is used to determine if a particular string can be generated by a context-free grammar. It is a bottom-up procedure that uses the dynamic programming technique to parse the given string.

Applications

The CYK algorithm has several applications in computational linguistics, natural language processing, and speech recognition. It is used in the following:

  • Parsing ambiguous, natural language sentences
  • Recognizing patterns in DNA sequences
  • Grammar induction
  • Machine translation
  • Part-of-speech tagging
Algorithm

The CYK algorithm works as follows:

  1. Start with a context-free grammar in Chomsky normal form.
  2. Create an n x n matrix, where n is the length of the input string.
  3. Initialize the matrix diagonals with the non-terminals corresponding to the input symbols.
  4. For each cell (i,j) in the matrix such that i < j:
    1. For each k from i to j-1
      1. For each production rule (A -> BC) in the grammar:
        1. If the cell (i, k) has a non-terminal B and cell (k+1, j) has a non-terminal C, then add A to cell (i, j).
  5. If the start symbol appears in cell (1, n), then the input string can be generated by the grammar. Otherwise, it cannot.
Example

Consider the grammar G = (V, Σ, P, S) where V = {S, A, B}, Σ = {a, b}, and P is given by the following rules:

  • S -> AB | BA
  • A -> BB | a
  • B -> AB | b

Let us check whether the string "baab" can be generated by the grammar using the CYK algorithm.

  1. Chomsky normal form of the grammar:
  • S -> X1X2 | X2X1
  • A -> X3X3 | a
  • B -> X1X2 | b
  1. Create a 4 x 4 matrix:

| | 1 | 2 | 3 | 4 | |----|-----|-----|-----|-----| | 1 | B | S,A | | | | 2 | | B | A,B | | | 3 | | | A,B | B | | 4 | | | | A |

  1. Initialize the matrix diagonals:

| | 1 | 2 | 3 | 4 | |----|-----|-----|-----|-----| | 1 | B | S,A | | | | 2 | | B | A,B | | | 3 | | | A,B | B | | 4 | | | | A |

  1. Fill in the rest of the matrix:

| | 1 | 2 | 3 | 4 | |----|-----|-----|-----|-----| | 1 | B | S,A | | AB | | 2 | | B | A,B | AB | | 3 | | | A,B | B | | 4 | | | | A |

| | 1 | 2 | 3 | 4 | |----|-----|-----|-----|-----| | 1 | B | S,A | AB | AB | | 2 | | B | A,B | AB | | 3 | | | A,B | B | | 4 | | | | A |

| | 1 | 2 | 3 | 4 | |----|-----|-----|-----|-----| | 1 | B | S,A | AB | AB | | 2 | | B | A,B | AB | | 3 | | | A,B | B | | 4 | | | | A |

| | 1 | 2 | 3 | 4 | |----|-----|-----|-----|-----| | 1 | B | S,A | AB | AB | | 2 | | B | A,B | AB | | 3 | | | A,B | B | | 4 | | | | A |

  1. Check if the start symbol appears in cell (1, 4). Since "S" does not appear in cell (1, 4), the string "baab" cannot be generated by the grammar.
Complexity

The time complexity of the CYK algorithm is O(n^3 * |G|), where n is the length of the input string and |G| is the size of the grammar. The space complexity is O(n^2 * |G|), which is the size of the matrix.

Conclusion

The CYK algorithm is a powerful parsing algorithm that can determine if a string can be generated by a context-free grammar. It has many applications in various fields, such as computational linguistics and natural language processing.