📜  用于模式搜索的 Rabin-Karp 算法的Python程序

📅  最后修改于: 2022-05-13 01:56:56.811000             🧑  作者: Mango

用于模式搜索的 Rabin-Karp 算法的Python程序

给定一个文本txt[0..n-1]和一个模式pat[0..m-1] ,编写一个函数search(char pat[], char txt[])打印txt中所有出现的pat[] [] 。你可以假设 n > m。

例子:

输入:txt[] = "THIS IS A TEST TEXT" pat[] = "TEST" 输出:在索引 10 找到的模式 输入:txt[] = "AABAACAADAABAABA" pat[] = "AABA" 输出:在索引 0 找到的模式在索引 9 找到的模式 在索引 12 找到的模式模式搜索

朴素字符串匹配算法会一张一张地滑动模式。每张幻灯片后,它会逐一检查当前班次的字符,如果所有字符都匹配,则打印匹配项。
与朴素算法一样,Rabin-Karp 算法也将图案一一滑动。但与 Naive 算法不同的是,Rabin Karp 算法将模式的哈希值与当前文本子串的哈希值匹配,如果哈希值匹配,则只有它开始匹配单个字符。所以 Rabin Karp 算法需要为后面的字符串计算哈希值。

1) 图案本身。
2) 长度为 m 的文本的所有子串。

Python
# Following program is the python implementation of
# Rabin Karp Algorithm given in CLRS book
  
# d is the number of characters in the input alphabet
d = 256
  
# pat  -> pattern
# txt  -> text
# q    -> A prime number
  
def search(pat, txt, q):
    M = len(pat)
    N = len(txt)
    i = 0
    j = 0
    p = 0    # hash value for pattern
    t = 0    # hash value for txt
    h = 1
  
    # The value of h would be "pow(d, M-1)% q"
    for i in xrange(M-1):
        h = (h * d)% q
  
    # Calculate the hash value of pattern and first window
    # of text
    for i in xrange(M):
        p = (d * p + ord(pat[i]))% q
        t = (d * t + ord(txt[i]))% q
  
    # Slide the pattern over text one by one
    for i in xrange(N-M + 1):
        # Check the hash values of current window of text and
        # pattern if the hash values match then only check
        # for characters on by one
        if p == t:
            # Check for characters one by one
            for j in xrange(M):
                if txt[i + j] != pat[j]:
                    break
  
            j+= 1
            # if p == t and pat[0...M-1] = txt[i, i + 1, ...i + M-1]
            if j == M:
                print "Pattern found at index " + str(i)
  
        # Calculate hash value for next window of text: Remove
        # leading digit, add trailing digit
        if i < N-M:
            t = (d*(t-ord(txt[i])*h) + ord(txt[i + M]))% q
  
            # We might get negative values of t, converting it to
            # positive
            if t < 0:
                t = t + q
  
# Driver program to test the above function
txt = "GEEKS FOR GEEKS"
pat = "GEEK"
q = 101 # A prime number
search(pat, txt, q)
  
# This code is contributed by Bhavya Jain


输出:
Pattern found at index 0
Pattern found at index 10

有关更多详细信息,请参阅有关模式搜索的 Rabin-Karp 算法的完整文章!