给定字符串的第 K 个字典序最小的唯一子字符串(1)

📌 相关文章

📜 给定字符串的第 K 个字典序最小的唯一子字符串(1)

📅 最后修改于: 2023-12-03 15:11:39.948000 🧑 作者: Mango

给定字符串的第 K 个字典序最小的唯一子字符串

在字符串处理中，我们经常需要找出一些唯一的子字符串。给定一个字符串，如何找到其第 K 个字典序最小的唯一子字符串呢？

定义

字典序：在字典中排列的顺序，即从左到右逐个比较字符的顺序。
唯一子字符串：在一个字符串中不重复出现的子串。

思路

我们可以采用 trie 树来存储给定字符串的所有子字符串，并通过 dfs 来搜索符合条件的唯一子字符串。

具体过程如下：

构建 trie 树。

对于给定字符串，我们可以从前往后遍历，将所有子串插入到 trie 树中。如果当前的节点在 trie 树中没有对应节点，则新建一个节点；否则继续往下遍历。
dfs 搜索唯一子字符串。

我们从根节点开始深搜，每到一个节点就判断其有没有被访问过。如果该节点已经被访问过，则说明该节点所代表的字符串是不唯一的，需要转到下一个节点。否则说明该节点代表的字符串是唯一的，标记该节点，继续往下遍历。

比较字典序，找到第 K 个最小子串。

我们可以采用字典序比较的方式来找到第 K 个最小子串。具体过程可以参考以下伪代码：

s = ''  # 记录当前的子串
cnt = 0  # 记录已访问的唯一子串个数
ans = ''  # 存储第 K 个最小的唯一子串
def dfs(u):
    global s, cnt, ans
    if trie[u].leaf and not visited[u]:
        visited[u] = True
        # 判断当前的子串是否比 ans 字典序小
        if s < ans:
            cnt += 1
            if cnt == K:
                ans = s
        # 继续往下遍历，先处理子节点
        for v in trie[u].children:
            s += v
            dfs(trie[u].children[v])
            s = s[:-1]
    else:
        # 转到下一个节点
        for v in trie[u].children:
            s += v
            dfs(trie[u].children[v])
            s = s[:-1]

代码实现

以下是采用 Python 语言实现以上思路的代码：

class TrieNode:
    def __init__(self):
        self.leaf = False
        self.children = {}
        
class Trie:
    def __init__(self):
        self.root = TrieNode()
        
    def insert(self, word):
        node = self.root
        for c in word:
            if c not in node.children:
                node.children[c] = TrieNode()
            node = node.children[c]
        node.leaf = True
        
    def dfs(self):
        global s, cnt, ans
        visited = {id(self.root): True}  # 存储节点是否被访问过
        s = ''
        cnt = 0
        ans = ''
        self._dfs(self.root, visited)
        return ans
        
    def _dfs(self, node, visited):
        global s, cnt, ans
        if node.leaf and not visited[id(node)]:
            visited[id(node)] = True
            # 判断当前的子串是否比 ans 字典序小
            if s < ans:
                cnt += 1
                if cnt == K:
                    ans = s
            # 继续往下遍历，先处理子节点
            for c, child in node.children.items():
                s += c
                self._dfs(child, visited)
                s = s[:-1]
        else:
            # 转到下一个节点
            for c, child in node.children.items():
                s += c
                self._dfs(child, visited)
                s = s[:-1]

总结

在字符串处理中，通过 trie 树存储字符串的所有子字符串是一种常用的操作。在本题中，我们利用 trie 树存储字符串的所有唯一子字符串，并通过 dfs 搜索合法的子字符串，最终找到第 K 个最小唯一子字符串。这种操作虽然时间复杂度较高，但在实际运用中仍有很大的价值。