📜  Java-正则表达式

📅  最后修改于: 2020-12-21 01:40:48             🧑  作者: Mango


Java提供java.util.regex包,用于与正则表达式进行模式匹配。 Java正则表达式与Perl编程语言非常相似,并且非常易于学习。

正则表达式是字符的特殊序列,可帮助您匹配或查找其他字符串或字符串集,使用的模式举办了专门的语法。它们可用于搜索,编辑或处理文本和数据。

java.util.regex软件包主要由以下三个类组成-

  • 模式类-模式对象是正则表达式的编译表示。 Pattern类不提供公共构造函数。要创建模式,必须首先调用其公共静态compile()方法之一,然后再返回一个Pattern对象。这些方法接受正则表达式作为第一个参数。

  • Matcher类-Matcher对象是解释模式并针对输入字符串执行匹配操作的引擎。与Pattern类一样,Matcher也没有定义公共构造函数。您可以通过在Pattern对象上调用matcher()方法来获得Matcher对象。

  • PatternSyntaxException -PatternSyntaxException对象是未经检查的异常,表示正则表达式模式中的语法错误。

捕获组

捕获组是一种将多个字符视为一个单元的方法。通过将要分组的字符放在一组括号内来创建它们。例如,正则表达式(狗)创建一个包含字母“ d”,“ o”和“ g”的单个组。

捕获组通过从左到右计数其开括号来编号。在表达式((A)(B(C)))中,例如有四个这样的组-

  • ((A)(B(C)))
  • (一种)
  • (公元前))
  • (C)

若要查找表达式中存在多少个组,请在匹配器对象上调用groupCount方法。 groupCount方法返回一个整数,该整数表示匹配器模式中存在的捕获组数。

还有一个特殊的组,组0,它始终代表整个表达式。该组不包括在groupCount报告的总数中。

下面的例子说明如何找到从给定的字母数字字符串,数字字符串-

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches {

   public static void main( String args[] ) {
      // String to be scanned to find the pattern.
      String line = "This order was placed for QT3000! OK?";
      String pattern = "(.*)(\\d+)(.*)";

      // Create a Pattern object
      Pattern r = Pattern.compile(pattern);

      // Now create matcher object.
      Matcher m = r.matcher(line);
      if (m.find( )) {
         System.out.println("Found value: " + m.group(0) );
         System.out.println("Found value: " + m.group(1) );
         System.out.println("Found value: " + m.group(2) );
      }else {
         System.out.println("NO MATCH");
      }
   }
}

这将产生以下结果-

输出

Found value: This order was placed for QT3000! OK?
Found value: This order was placed for QT300
Found value: 0

正则表达式语法

下表列出了Java中所有可用的正则表达式元字符语法-

Subexpression Matches
^ Matches the beginning of the line.
$ Matches the end of the line.
. Matches any single character except newline. Using m option allows it to match the newline as well.
[…] Matches any single character in brackets.
[^…] Matches any single character not in brackets.
\A Beginning of the entire string.
\z End of the entire string.
\Z End of the entire string except allowable final line terminator.
re* Matches 0 or more occurrences of the preceding expression.
re+ Matches 1 or more of the previous thing.
re? Matches 0 or 1 occurrence of the preceding expression.
re{ n} Matches exactly n number of occurrences of the preceding expression.
re{ n,} Matches n or more occurrences of the preceding expression.
re{ n, m} Matches at least n and at most m occurrences of the preceding expression.
a| b Matches either a or b.
(re) Groups regular expressions and remembers the matched text.
(?: re) Groups regular expressions without remembering the matched text.
(?> re) Matches the independent pattern without backtracking.
\w Matches the word characters.
\W Matches the nonword characters.
\s Matches the whitespace. Equivalent to [\t\n\r\f].
\S Matches the nonwhitespace.
\d Matches the digits. Equivalent to [0-9].
\D Matches the nondigits.
\A Matches the beginning of the string.
\Z Matches the end of the string. If a newline exists, it matches just before newline.
\z Matches the end of the string.
\G Matches the point where the last match finished.
\n Back-reference to capture group number “n”.
\b Matches the word boundaries when outside the brackets. Matches the backspace (0x08) when inside the brackets.
\B Matches the nonword boundaries.
\n, \t, etc. Matches newlines, carriage returns, tabs, etc.
\Q Escape (quote) all characters up to \E.
\E Ends quoting begun with \Q.

Matcher类的方法

这是有用的实例方法的列表-

索引方法

索引方法提供有用的索引值,这些值精确显示在输入字符串找到匹配项的位置-

Sr.No. Method & Description
1

public int start()

Returns the start index of the previous match.

2

public int start(int group)

Returns the start index of the subsequence captured by the given group during the previous match operation.

3

public int end()

Returns the offset after the last character matched.

4

public int end(int group)

Returns the offset after the last character of the subsequence captured by the given group during the previous match operation.

学习方法

研究方法检查输入字符串并返回一个布尔值,指示是否找到该模式-

Sr.No. Method & Description
1

public boolean lookingAt()

Attempts to match the input sequence, starting at the beginning of the region, against the pattern.

2

public boolean find()

Attempts to find the next subsequence of the input sequence that matches the pattern.

3

public boolean find(int start)

Resets this matcher and then attempts to find the next subsequence of the input sequence that matches the pattern, starting at the specified index.

4

public boolean matches()

Attempts to match the entire region against the pattern.

更换方法

替换方法是用于替换输入字符串本的有用方法-

Sr.No. Method & Description
1

public Matcher appendReplacement(StringBuffer sb, String replacement)

Implements a non-terminal append-and-replace step.

2

public StringBuffer appendTail(StringBuffer sb)

Implements a terminal append-and-replace step.

3

public String replaceAll(String replacement)

Replaces every subsequence of the input sequence that matches the pattern with the given replacement string.

4

public String replaceFirst(String replacement)

Replaces the first subsequence of the input sequence that matches the pattern with the given replacement string.

5

public static String quoteReplacement(String s)

Returns a literal replacement String for the specified String. This method produces a String that will work as a literal replacement s in the appendReplacement method of the Matcher class.

起始和结束方法

以下是计算单词“ cat”在输入字符串出现的次数的示例-

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches {

   private static final String REGEX = "\\bcat\\b";
   private static final String INPUT = "cat cat cat cattie cat";

   public static void main( String args[] ) {
      Pattern p = Pattern.compile(REGEX);
      Matcher m = p.matcher(INPUT);   // get a matcher object
      int count = 0;

      while(m.find()) {
         count++;
         System.out.println("Match number "+count);
         System.out.println("start(): "+m.start());
         System.out.println("end(): "+m.end());
      }
   }
}

这将产生以下结果-

输出

Match number 1
start(): 0
end(): 3
Match number 2
start(): 4
end(): 7
Match number 3
start(): 8
end(): 11
Match number 4
start(): 19
end(): 22

您可以看到该示例使用单词边界来确保字母“ c”,“ a”,“ t”不仅是较长单词中的子字符串。它还提供了一些有用的信息,说明匹配在输入字符串的何处发生。

start方法返回在上一次匹配操作期间给定组捕获的子序列的起始索引,而end返回最后匹配的字符的索引加一个。

比赛和lookingAt方法

match和lookingAt方法都尝试将输入序列与模式进行匹配。但是,区别在于匹配要求整个输入序列都必须匹配,而lookAt则不需要。

这两种方法总是从输入字符串的开头开始。这是解释功能的示例-

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches {

   private static final String REGEX = "foo";
   private static final String INPUT = "fooooooooooooooooo";
   private static Pattern pattern;
   private static Matcher matcher;

   public static void main( String args[] ) {
      pattern = Pattern.compile(REGEX);
      matcher = pattern.matcher(INPUT);

      System.out.println("Current REGEX is: "+REGEX);
      System.out.println("Current INPUT is: "+INPUT);

      System.out.println("lookingAt(): "+matcher.lookingAt());
      System.out.println("matches(): "+matcher.matches());
   }
}

这将产生以下结果-

输出

Current REGEX is: foo
Current INPUT is: fooooooooooooooooo
lookingAt(): true
matches(): false

replaceFirst和replaceAll方法

replaceFirst和replaceAll方法替换与给定正则表达式匹配的文本。顾名思义,replaceFirst将替换第一个匹配项,而replaceAll将替换所有匹配项。

这是解释功能的示例-

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches {

   private static String REGEX = "dog";
   private static String INPUT = "The dog says meow. " + "All dogs say meow.";
   private static String REPLACE = "cat";

   public static void main(String[] args) {
      Pattern p = Pattern.compile(REGEX);
      
      // get a matcher object
      Matcher m = p.matcher(INPUT); 
      INPUT = m.replaceAll(REPLACE);
      System.out.println(INPUT);
   }
}

这将产生以下结果-

输出

The cat says meow. All cats say meow.

appendReplacement和appendTail方法

Matcher类还提供了appendReplacement和appendTail方法用于文本替换。

这是解释功能的示例-

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches {

   private static String REGEX = "a*b";
   private static String INPUT = "aabfooaabfooabfoob";
   private static String REPLACE = "-";
   public static void main(String[] args) {

      Pattern p = Pattern.compile(REGEX);
      
      // get a matcher object
      Matcher m = p.matcher(INPUT);
      StringBuffer sb = new StringBuffer();
      while(m.find()) {
         m.appendReplacement(sb, REPLACE);
      }
      m.appendTail(sb);
      System.out.println(sb.toString());
   }
}

这将产生以下结果-

输出

-foo-foo-foo-

PatternSyntaxException类方法

PatternSyntaxException是未经检查的异常,它指示正则表达式模式中的语法错误。 PatternSyntaxException类提供以下方法来帮助您确定出了什么问题-

Sr.No. Method & Description
1

public String getDescription()

Retrieves the description of the error.

2

public int getIndex()

Retrieves the error index.

3

public String getPattern()

Retrieves the erroneous regular expression pattern.

4

public String getMessage()

Returns a multi-line string containing the description of the syntax error and its index, the erroneous regular expression pattern, and a visual indication of the error index within the pattern.