📜  Cocke–Younger–Kasami(CYK)算法(1)

📅  最后修改于: 2023-12-03 15:14:11.312000             🧑  作者: Mango

CYK Algorithm

The Cocke-Younger-Kasami (CYK) algorithm is a dynamic programming algorithm used for parsing context-free grammars. The algorithm determines if a string can be generated by a grammar and, if so, provides a valid derivation tree.

How it works

The CYK algorithm builds a table of non-terminals that can generate a substring of the input string. The table is triangular, with cells indexed by the starting position of the substring and its length. Each cell contains a set of non-terminals that can generate the substring.

The algorithm then recursively calculates the cells of the table based on the cells below and to the left of them. This is done until the top cell of the table contains the start symbol of the grammar, indicating that the input string can be generated by the grammar.

Pseudocode
# Inputs: 
#     - input string: s
#     - CFG in Chomsky Normal Form: G = (V, T, P, S)
# Outputs: 
#     - Boolean indicating if the input string can be parsed by the grammar
#     - If True, a valid parse tree

n = len(s)
table = [[set() for _ in range(n-i)] for i in range(n)]
for i in range(n):
    for rule in P:
        if rule[1] == s[i]:
            table[0][i].add(rule[0])

for l in range(2, n+1):
    for i in range(n-l+1):
        j = i + l - 1
        for k in range(i, j):
            for rule in P:
                if len(rule[1]) == 2:
                    if rule[1][0] in table[k-i][0] and rule[1][1] in table[j-k-1][k+1-i]:
                        table[l-1][i].add(rule[0])

return S in table[n-1][0], table[n-1][0]
Complexity

The CYK algorithm has a time complexity of O(n^3 * |G|) and a space complexity of O(n^2 * |G|), where n is the length of the input string and |G| is the size of the grammar. The algorithm is efficient for small grammars and short input strings, making it a popular algorithm for natural language processing applications such as speech recognition and machine translation.

Conclusion

The CYK algorithm is a powerful tool for parsing context-free grammars. Its dynamic programming approach allows it to efficiently handle even large context-free grammars and input strings. With its applications in natural language processing, it is a valuable algorithm for any programmer to know.