📌  相关文章
📜  使用正则表达式从句子中删除重复的单词

📅  最后修改于: 2022-05-13 01:55:49.202000             🧑  作者: Mango

使用正则表达式从句子中删除重复的单词

给定一个表示句子的字符串str ,任务是使用Java中的正则表达式从句子中删除重复的单词。
例子:

方法

  1. 得到句子。
  2. 形成一个正则表达式以从句子中删除重复的单词。
regex = "\\b(\\w+)(?:\\W+\\1\\b)+";
  1. 上述正则表达式的细节可以理解为:
    • “\\b” :单词边界。特殊情况需要边界。例如,在“My thesis is great”中,“is”不会匹配两次。
    • “\\w+”一个单词字符:[a-zA-Z_0-9]
    • “\\W+” : 非单词字符: [^\w]
    • “\\1” :匹配第一组括号中匹配的任何内容,在本例中为 (\w+)
    • “+” :匹配 1 次或多次后放置的任何内容
  2. 将句子与正则表达式匹配。在Java中,这可以使用 Pattern.matcher() 来完成。
  3. 返回修改后的句子。

下面是上述方法的实现:

C++
// C++ program to remove duplicate words
// using Regular Expression or ReGex.
#include 
#include 
using namespace std;
 
// Function to validate the sentence
// and remove the duplicate words
string removeDuplicateWords(string s)
{
 
  // Regex to matching repeated words.
  const regex pattern("\\b(\\w+)(?:\\W+\\1\\b)+", regex_constants::icase);
 
  string answer = s;
  for (auto it = sregex_iterator(s.begin(), s.end(), pattern);
       it != sregex_iterator(); it++)
  {
      // flag type for determining the matching behavior
      // here it is for matches on 'string' objects
      smatch match;
      match = *it;
      answer.replace(answer.find(match.str(0)), match.str(0).length(), match.str(1));
  }
 
  return answer;
}
 
// Driver Code
int main()
{
  // Test Case: 1
  string str1
      = "Good bye bye world world";
  cout << removeDuplicateWords(str1) << endl;
 
  // Test Case: 2
  string str2
      = "Ram went went to to his home";
  cout << removeDuplicateWords(str2) << endl;
 
  // Test Case: 3
  string str3
      = "Hello hello world world";
  cout << removeDuplicateWords(str3) << endl;
 
  return 0;
}
 
// This code is contributed by yuvraj_chandra


Java
// Java program to remove duplicate words
// using Regular Expression or ReGex.
 
import java.util.regex.Matcher;
import java.util.regex.Pattern;
 
class GFG {
 
    // Function to validate the sentence
    // and remove the duplicate words
    public static String
    removeDuplicateWords(String input)
    {
 
        // Regex to matching repeated words.
        String regex
            = "\\b(\\w+)(?:\\W+\\1\\b)+";
        Pattern p
            = Pattern.compile(
                regex,
                Pattern.CASE_INSENSITIVE);
 
        // Pattern class contains matcher() method
        // to find matching between given sentence
        // and regular expression.
        Matcher m = p.matcher(input);
 
        // Check for subsequences of input
        // that match the compiled pattern
        while (m.find()) {
            input
                = input.replaceAll(
                    m.group(),
                    m.group(1));
        }
        return input;
    }
 
    // Driver code
    public static void main(String args[])
    {
 
        // Test Case: 1
        String str1
            = "Good bye bye world world";
        System.out.println(
            removeDuplicateWords(str1));
 
        // Test Case: 2
        String str2
            = "Ram went went to to his home";
        System.out.println(
            removeDuplicateWords(str2));
 
        // Test Case: 3
        String str3
            = "Hello hello world world";
        System.out.println(
            removeDuplicateWords(str3));
    }
}


Python3
# Python program to remove duplicate words
# using Regular Expression or ReGex.
import re
 
 
# Function to validate the sentence
# and remove the duplicate words
def removeDuplicateWords(input):
 
    # Regex to matching repeated words
    regex = r'\b(\w+)(?:\W+\1\b)+'
 
    return re.sub(regex, r'\1', input, flags=re.IGNORECASE)
 
 
# Driver Code
 
# Test Case: 1
str1 = "Good bye bye world world"
print(removeDuplicateWords(str1))
 
# Test Case: 2
str2 = "Ram went went to to his home"
print(removeDuplicateWords(str2))
 
# Test Case: 3
str3 = "Hello hello world world"
print(removeDuplicateWords(str3))
 
# This code is contributed by yuvraj_chandra


输出:
Good bye world
Ram went to his home
Hello world