📜  Cocke–Younger–Kasami (CYK) 算法(1)

📅  最后修改于: 2023-12-03 15:14:11.303000             🧑  作者: Mango

Cocke-Younger-Kasami (CYK) Algorithm

The Cocke-Younger-Kasami (CYK) algorithm is a dynamic programming algorithm used for parsing context-free grammars. It is named after its inventors, John Cocke, Daniel Younger, and Tadao Kasami.

Basic Idea

The basic idea of the algorithm is to build a parse tree of a given input sentence based on the rules of a context-free grammar. The algorithm works by breaking the input sentence down into its constituent parts, known as terminals and non-terminals, and then checking whether these parts can be combined according to the rules of the grammar to form the sentence.

Algorithm Steps

The CYK algorithm can be broken down into the following steps:

  1. Convert the context-free grammar to Chomsky normal form (CNF).
  2. Tokenize the input sentence and map each word to its corresponding terminal symbol in the grammar.
  3. Initialize a two-dimensional array to store the possible non-terminals that can produce each substring of the input sentence.
  4. Fill in the array by iterating over all possible substrings of the input sentence, and considering all possible ways of combining the non-terminals to produce each substring.
  5. Return the top cell of the array, which contains the start symbol of the grammar if the input sentence is valid.
Code Example

Here is an implementation of the CYK algorithm in Python:

def cyk(grammar, sentence):
    # Step 1: Convert the grammar to CNF
    cnf = grammar.convert_to_cnf()

    # Step 2: Tokenize the input sentence and map to terminals
    tokens = sentence.split()
    terminals = [grammar.get_terminal(token) for token in tokens]

    # Step 3: Initialize the array
    n = len(terminals)
    chart = [[set() for _ in range(n)] for _ in range(n)]

    # Step 4: Fill in the array
    for i in range(n):
        chart[i][i].update(rule.head for rule in cnf if rule.body == (terminals[i],))

        for j in range(i):
            for k in range(j, i):
                chart[j][i].update(rule.head for rule in cnf
                                    for left in chart[j][k]
                                    for right in chart[k+1][i]
                                    if rule.body == (left, right))

    # Step 5: Check if the input sentence is valid
    return cnf.start in chart[0][n-1]

Note that this implementation assumes that the context-free grammar is represented as an object with the following methods:

  • convert_to_cnf: Converts the grammar to Chomsky normal form.
  • get_terminal: Maps a token to its corresponding terminal symbol in the grammar.
  • start: Returns the start symbol of the grammar.
Conclusion

The Cocke-Younger-Kasami algorithm is an important algorithm in natural language processing and computational linguistics. It can be used to parse sentences and generate parse trees according to a context-free grammar. The algorithm is based on dynamic programming and works by filling in a table with possible non-terminals that can produce each substring of the input sentence.