📜  Java中的正则表达式边界匹配器

📅  最后修改于: 2022-05-13 01:55:12.151000             🧑  作者: Mango

Java中的正则表达式边界匹配器

先决条件 - Java中的正则表达式

边界匹配可以帮助我们找到字符串匹配发生的位置。您可以通过使用边界匹配器指定此类信息来使您的模式匹配更加精确。例如,也许您对查找特定单词感兴趣,但前提是它出现在一行的开头或结尾。或者您可能想知道匹配是发生在单词边界上,还是发生在上一个匹配的末尾。

边界匹配器列表

  • ^ 放在要匹配的单词之前
  • $ -放置在要匹配的单词的末尾
  • \b -检查模式是在单词边界上开始还是结束
  • \B 匹配非单词边界上的表达式
  • \A -输入的开始
  • \G -要求匹配仅在上一个匹配结束时发生
  • \Z -输入的结尾,但对于最终终止符,如果有的话
  • \z -输入的结尾

案例 1:用 ^ 和 $ 匹配单词

  • ^ – 匹配一行的开头
  • $ - 匹配结尾。
  • Input : txt = "geeksforgeeks", regex = "^geeks"
    Output : Found from index 0 to 3
    Explanation : Note that the result doesn't include "geeks" after
                  "for" as we have used ^ in regex.
  • Input : txt = "geeksforgeeks", regex = "geeks$"
    Output : Found from index 8 to 13.
    Explanation : Note that the result doesn't include "geeks" before 
                 "for" as we have used $ in regex.
  • Input : txt = "geeksforgeeks", regex = "^geeks$"
    Output : No match found
    Explanation : The given regex would only matches with "geeks".
  • Input : txt = "  geeksforgeeks", regex = "^geeks"
    Output: No match found.
    Explanation : The input string contains extra whitespace at the beginning.
  • // Extra \ is used to escape one \
    Input : txt = "  geeksforgeeks", regex : "^\\s+geeks"
    Output: Found from index 0 to 6.
    Explanation : The pattern specifies geeks after one or more spaces.
// Java program to demonstrate that ^ matches the beginning of
// a line, and $ matches the end.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
  
class Reg
{
    public static void main(String[] args)
    {
        String txt = "geeksforgeeks";
  
        // Demonstrating ^
        String regex1 = "^geeks";
        Pattern pattern1 = Pattern.compile(regex1, Pattern.CASE_INSENSITIVE);
        Matcher matcher1 = pattern1.matcher(txt);
        while (matcher1.find())
        {
            System.out.println("Start index: " + matcher1.start());
            System.out.println("End index: " + matcher1.end());
        }
  
        // Demonstrating $
        String regex2 = "geeks$";
        Pattern pattern2 = Pattern.compile(regex2, Pattern.CASE_INSENSITIVE);
        Matcher matcher2 = pattern2.matcher(txt);
        while (matcher2.find())
        {
            System.out.println("\nStart index: " + matcher2.start());
            System.out.println("End index: " + matcher2.end());
        }
    }
}

输出:

Start index: 0
End index: 5

Start index: 8
End index: 13

案例 2:使用 \b 检查模式是在单词边界上开始还是结束

  • Input: txt = "geeksforgeeks geekspractice", pat = "\\bgeeks"
    Output: Found from index 0 to 5 and from index 14 to 19
    Explanation : The pattern "geeks" is present at the beginning
                  of two words "geeksforgeeks" and "geekspractice"
    
    
  • Input: txt = "geeksforgeeks geekspractice", pat = "geeks\\b"
    Output: Found from index 8 to 13
    Explanation : The pattern "geeks" is present at the end of one
                  word "geeksforgeeks"
    
// Java program to demonstrate use of \b to match 
// regex at beginning and end of word boundary 
import java.util.regex.Matcher; 
import java.util.regex.Pattern; 
  
class Reg 
{ 
    public static void main(String[] args) 
    { 
        String txt = "geeksforgeeks geekspractice"; 
  
        // Demonstrating beginning of word boundary 
        String regex1 = "\\bgeeks"; // Matched at two places 
        Pattern pattern1 = Pattern.compile(regex1, Pattern.CASE_INSENSITIVE); 
        Matcher matcher1 = pattern1.matcher(txt); 
        while (matcher1.find()) 
        { 
            System.out.println("Start index: " + matcher1.start()); 
            System.out.println("End index: " + matcher1.end()); 
        } 
  
        // Demonstrating end of word boundary 
        String regex2 = "geeks\\b"; // Matched at one place 
        Pattern pattern2 = Pattern.compile(regex2, Pattern.CASE_INSENSITIVE); 
        Matcher matcher2 = pattern2.matcher(txt); 
        while (matcher2.find()) 
        { 
            System.out.println("\nStart index: " + matcher2.start()); 
            System.out.println("End index: " + matcher2.end()); 
        } 
    } 
} 

输出:

Start index: 0
End index: 5
Start index: 14
End index: 19

Start index: 8
End index: 13

案例 3:匹配非单词边界上的表达式,改用 \B

  • Input: txt = "geeksforgeeks geekspractice", pat = "\\Bgeeks"
    Output: Found from index 8 to 13
    Explanation : One occurrence  of pattern "geeks" is not present at
                  the beginning of word which is end of "geeksforgeeks"
    
    
    
  • Input: txt = "geeksforgeeks geekspractice", pat = "geeks\\B"
    Output: Found from index 0 to 5 and from index 14 to 19
    Explanation : Two occurrences of "geeks" are not present at the end
                  of word.
    
// Java program to demonstrate use of \B to match 
// regex at beginning and end of non word boundary 
import java.util.regex.Matcher; 
import java.util.regex.Pattern; 
  
class Reg 
{ 
    public static void main(String[] args) 
    { 
        String txt = "geeksforgeeks geekspractice"; 
  
        // Demonstrating Not beginning of word 
        String regex1 = "\\Bgeeks"; // Matches with two 
        Pattern pattern1 = Pattern.compile(regex1, Pattern.CASE_INSENSITIVE); 
        Matcher matcher1 = pattern1.matcher(txt); 
        while (matcher1.find()) 
        { 
            System.out.println("Start index: " + matcher1.start()); 
            System.out.println("End index: " + matcher1.end() + "\n"); 
        } 
  
        // Demonstrating Not end of word 
        String regex2 = "geeks\\B"; // Matches with one 
        Pattern pattern2 = Pattern.compile(regex2, Pattern.CASE_INSENSITIVE); 
        Matcher matcher2 = pattern2.matcher(txt); 
        while (matcher2.find()) 
        { 
            System.out.println("Start index: " + matcher2.start()); 
            System.out.println("End index: " + matcher2.end()); 
        } 
    } 
} 

输出:

Start index: 8
End index: 13

Start index: 0
End index: 5
Start index: 14
End index: 19

情况 4:匹配只发生在前一个匹配的末尾,使用 \G:

  • Input: txt = "geeksgeeks geeks", pat = "\\Ggeeks"
    Output: Found from index 0 to 5 and from 5 to 10
    Explanation : Only first two occurrences of "geeks" in text
                  match. the occurrence after space doesn't match
                  as it is not just after previous match.
    
// Java program to demonstrate use of \G to match 
// to occur only at the end of the previous match 
import java.util.regex.Matcher; 
import java.util.regex.Pattern; 
  
class Reg 
{ 
    public static void main(String[] args) 
    { 
        String txt = "geeksgeeks geeks"; 
  
        // Demonstrating \G 
        String regex1 = "\\Ggeeks"; // Matches with first two geeks 
        Pattern pattern1 = Pattern.compile(regex1, Pattern.CASE_INSENSITIVE); 
        Matcher matcher1 = pattern1.matcher(txt); 
        while (matcher1.find()) 
        { 
            System.out.println("Start index: " + matcher1.start()); 
            System.out.println("End index: " + matcher1.end()); 
        } 
    } 
} 

输出:

Start index: 0
End index: 5
Start index: 5
End index: 10

参考资料: https://docs.oracle.com/javase/tutorial/essential/regex/bounds.html