Python表达式

📌 相关文章

📜 Python表达式

📅 最后修改于: 2020-09-19 14:58:37 🧑 作者: Mango

在本教程中，您将学习正则表达式(RegEx)，并使用Python的re模块与RegEx一起使用(在示例的帮助下)。

的再 gular 例 PRESSION(正则表达式)是字符序列，它定义一个搜索模式。例如，

^a...s$

上面的代码定义了RegEx模式。模式是： 以a开头并以s结尾的任何五个字母字符串 。

使用RegEx定义的模式可用于与字符串匹配。

Expression	String	Matched?
`^a...s$`	`abs`	No match
	`alias`	Match
	`abyss`	Match
	`Alias`	No match
	`An abacus`	No match

Python有一个名为re的模块可与RegEx一起使用。这是一个例子：

import re

pattern = '^a...s$'
test_string = 'abyss'
result = re.match(pattern, test_string)

if result:
  print("Search successful.")
else:
  print("Search unsuccessful.")在这里，我们使用re.match() 函数在test_string搜索pattern 。如果搜索成功，该方法将返回一个匹配对象。如果不是，则返回None 。 
 re模块中定义了其他一些功能，可与RegEx一起使用。在探讨之前，让我们学习正则表达式本身。 
如果您已经了解RegEx的基础知识，请跳至Python RegEx。 
使用正则表达式指定模式
为了指定正则表达式，使用了元字符。在上面的示例中， ^和$是元字符。 
元字符
元字符是RegEx引擎以特殊方式解释的字符。以下是元字符列表： 
 [] 。 ^ $ * + ？ {} () \ | 
 [] -方括号 
方括号指定您要匹配的一组字符。 
Expression String Matched?
[abc] a 1 match
ac 2 matches
Hey Jude No match
abc de ca 5 matches
在这里，如果您要匹配的字符串包含a ， b或c中的任何a ，则[abc]将匹配。 
您还可以使用-在方括号内指定字符范围。 
 [ae]与[abcde]相同。 
 [1-4]与[1234]相同。 
 [0-39]与[01239]相同。 

您可以通过在方括号的开头使用插入号^符号来补充(反转)字符集。 
 [^abc]表示除a或b或c之外a任何字符。 
 [^0-9]表示任何非数字字符。 

 . - 期间 
句点匹配任何单个字符(换行符'\n'除外)。 
Expression String Matched?
.. a No match
ac 1 match
acd 1 match
acde 2 matches (contains 4 characters)
 ^ - 插入符号 
插入符号^用于检查字符串是否以某个特定字符开头 。 
Expression String Matched?
^a a 1 match
abc 1 match
bac No match
^ab abc 1 match
acb No match (starts with a but not followed by b)
 $ - 美元 
美元符号$用于检查字符串是否以某个字符结尾 。 
Expression String Matched?
a$ a 1 match
formula 1 match
cab No match
 * - 星 
星号*匹配零个或多个剩余的模式。 
Expression String Matched?
ma*n mn 1 match
man 1 match
maaan 1 match
main No match (a is not followed by n)
woman 1 match
 + - 加 
加号+匹配出现在其上的一个或多个模式。 
Expression String Matched?
ma+n mn No match (no a character)
man 1 match
maaan 1 match
main No match (a is not followed by n)
woman 1 match
 ? - 问号 
问号符号?匹配零或一出现的模式。 
Expression String Matched?
ma?n mn 1 match
man 1 match
maaan No match (more than one a character)
main No match (a is not followed by n)
woman 1 match
 {} - 大括号 
考虑以下代码： {n,m} 。这意味着至少要保留n样式，最多m重复样式。 
Expression String Matched?
a{2,3} abc dat No match
abc daat 1 match (at daat)
aabc daaat 2 matches (at aabc and daaat)
aabc daaaat 2 matches (at aabc and daaaat)
让我们再尝试一个示例。此RegEx [0-9]{2, 4}匹配至少2位但不超过4位
Expression String Matched?
[0-9]{2,4} ab123csde 1 match (match at ab123csde)
12 and 345673 3 matches (12, 3456, 73)
1 and 2 No match
 | - 交替 
竖条|用于交替( or 运算符)。 
Expression String Matched?
a|b cde No match
ade 1 match (match at ade)
acdbea 3 matches (at acdbea)
此处， a|b匹配包含a或b任何字符串 
 () - 组 
括号()用于对子模式进行分组。例如， (a|b|c)xz匹配任何匹配a或b或c 字符串 ，后跟xz 
Expression String Matched?
(a|b|c)xz ab xz No match
abxz 1 match (match at abxz)
axz cabxz 2 matches (at axzbc cabxz)
 \ - 反斜杠 
反冲\用于转义包括所有元字符在内的各种字符。例如， 
 \$a如果字符串包含$后跟a $则匹配。此处，RegEx引擎不会以特殊方式解释$ 。 
如果不确定某个字符是否具有特殊含义，可以在其前面加上\ 。这样可以确保不对字符进行特殊处理。 
 特殊序列 
特殊序列使常用模式更易于编写。以下是特殊序列的列表： 
 \A如果指定的字符位于字符串的开头，则匹配。 
Expression String Matched?
\Athe the sun Match
In the sun No match
 \b如果指定字符在单词的开头或结尾，则匹配。 
Expression String Matched?
\bfoo football Match
a football Match
afootball No match
foo\b the foo Match
the afoo test Match
the afootest No match
 \B与\b相反。如果指定的字符不在单词的开头或结尾，则匹配。 
Expression String Matched?
\Bfoo football No match
a football No match
afootball Match
foo\B the foo No match
the afoo test No match
the afootest Match
 \d匹配任何十进制数字。相当于[0-9] 
Expression String Matched?
\d 12abc3 3 matches (at 12abc3)
Python No match
 \D匹配任何非十进制数字。相当于[^0-9] 
Expression String Matched?
\D 1ab34"50 3 matches (at 1ab34"50)
1345 No match
 \s匹配字符串包含任何空格字符的地方。等效于[ \t\n\r\f\v] 。 
Expression String Matched?
\s Python RegEx 1 match
PythonRegEx No match
 \S匹配字符串包含任何非空白字符的地方。等效于[^ \t\n\r\f\v] 。 
Expression String Matched?
\S a b 2 matches (at  a b)
    No match
 \w匹配任何字母数字字符(数字和字母)。等效于[a-zA-Z0-9_] 。顺便说一下，下划线_也被认为是字母数字字符。 
Expression String Matched?
\w 12&": ;c 3 matches (at 12&": ;c)
%"> ! No match
 \W匹配任何非字母数字字符。等效于[^a-zA-Z0-9_] 
Expression String Matched?
\W 1a2%c 1 match (at 1a2%c)
Python No match
 \Z如果指定的字符位于字符串的末尾，则匹配。 
Expression String Matched?
Python\Z I like Python 1 match
I like Python Programming No match
Python is fun. No match
 提示：要构建和测试正则表达式，可以使用RegEx测试器工具，例如regex101。该工具不仅可以帮助您创建正则表达式，而且还可以帮助您学习它。 
现在，您了解了RegEx的基础知识，让我们讨论如何在Python代码中使用RegEx。 
 Python表达式
 Python有一个名为re的模块，可用于正则表达式。要使用它，我们需要导入模块。 
import re

该模块定义了一些可与RegEx一起使用的函数和常量。 
 re.findall() 
 re.findall()方法返回包含所有匹配项的字符串列表。 
示例1：re.findall() 

# Program to extract numbers from a string

import re

string = 'hello 12 hi 89. Howdy 34'
pattern = '\d+'

result = re.findall(pattern, string) 
print(result)

# Output: ['12', '89', '34']


如果找不到该模式，则re.findall()返回一个空列表。 
 re.split() 
 re.split方法在存在匹配项的情况下拆分字符串 ，并返回发生拆分的字符串列表。 
示例2：re.split() 

import re

string = 'Twelve:12 Eighty nine:89.'
pattern = '\d+'

result = re.split(pattern, string) 
print(result)

# Output: ['Twelve:', ' Eighty nine:', '.']


如果找不到该模式，则re.split()返回一个包含原始字符串的列表。 
您可以将maxsplit参数传递给re.split()方法。这是将要发生的最大拆分次数。 

import re

string = 'Twelve:12 Eighty nine:89 Nine:9.'
pattern = '\d+'

# maxsplit = 1
# split only at the first occurrence
result = re.split(pattern, string, 1) 
print(result)

# Output: ['Twelve:', ' Eighty nine:89 Nine:9.']


顺便说一句， maxsplit的默认值为0；表示所有可能的分裂。 
 re.sub() 
 re.sub()的语法为： 

re.sub(pattern, replace, string)

该方法返回一个字符串 ，其中匹配的匹配项被replace变量的内容replace 。 
示例3：re.sub() 

# Program to remove all whitespaces
import re

# multiline string
string = 'abc 12\
de 23 \n f45 6'

# matches all whitespace characters
pattern = '\s+'

# empty string
replace = ''

new_string = re.sub(pattern, replace, string) 
print(new_string)

# Output: abc12de23f456


如果找不到该模式，则re.sub()返回原始字符串。 
您可以将count作为第四个参数传递给re.sub()方法。如果省略，则结果为0。这将替换所有出现的事件。 

import re

# multiline string
string = 'abc 12\
de 23 \n f45 6'

# matches all whitespace characters
pattern = '\s+'
replace = ''

new_string = re.sub(r'\s+', replace, string, 1) 
print(new_string)

# Output:
# abc12de 23
# f45 6


 re.subn() 
 re.subn()与re.sub()类似，期望它返回一个包含2个项目的元组，其中包含新字符串和进行替换的次数。 
示例4：re.subn() 

# Program to remove all whitespaces
import re

# multiline string
string = 'abc 12\
de 23 \n f45 6'

# matches all whitespace characters
pattern = '\s+'

# empty string
replace = ''

new_string = re.subn(pattern, replace, string) 
print(new_string)

# Output: ('abc12de23f456', 4)


研究() 
 re.search()方法采用两个参数：模式和字符串。该方法查找RegEx模式与字符串匹配的第一个位置。 
如果搜索成功，则re.search()返回一个match对象。如果不是，则返回None 。 

match = re.search(pattern, str)

示例5：re.search() 

import re

string = "Python is fun"

# check if 'Python' is at the beginning
match = re.search('\APython', string)

if match:
  print("pattern found inside the string")
else:
  print("pattern not found")  

# Output: pattern found inside the string


在这里， match包含一个match对象。 
匹配对象
您可以使用dir() 函数获取匹配对象的方法和属性。 
匹配对象的一些常用方法和属性是： 
 match.group() 
 group()方法返回字符串中匹配的部分。 
示例6：匹配对象

import re

string = '39801 356, 2102 1111'

# Three digit number followed by space followed by two digit number
pattern = '(\d{3}) (\d{2})'

# match variable contains a Match object.
match = re.search(pattern, string) 

if match:
  print(match.group())
else:
  print("pattern not found")

# Output: 801 35


在这里， match变量包含一个match对象。 
我们的模式(\d{3}) (\d{2})有两个子组(\d{3})和(\d{2}) 。您可以获取这些带括号的子组的字符串的一部分。这是如何做： 

>>> match.group(1)
'801'

>>> match.group(2)
'35'
>>> match.group(1, 2)
('801', '35')

>>> match.groups()
('801', '35')


 match.start()，match.end()和match.span() 
 start() 函数返回匹配的子字符串的开头的索引。同样， end()返回匹配的子字符串的结束索引。 

>>> match.start()
2
>>> match.end()
8

 span() 函数返回一个包含匹配部分的开始和结束索引的元组。 

>>> match.span()
(2, 8)

 match.re和match。 字符串 
匹配对象的re属性返回一个正则表达式对象。同样， string属性返回传递的字符串。 

>>> match.re
re.compile('(\\d{3}) (\\d{2})')

>>> match.string
'39801 356, 2102 1111'


我们已经介绍了re模块中定义的所有常用方法。如果您想了解更多，请访问Python 3 re模块。 
在RegEx之前使用r前缀
在正则表达式前使用r或R前缀时，表示原始字符串。例如， '\n'是换行，而r'\n'表示两个字符：反斜杠\后跟n 。 
反冲\用于转义包括所有元字符在内的各种字符。但是，使用r前缀会使\视为普通字符。 
示例7：使用r前缀的原始字符串 

import re

string = '\n and \r are escape sequences.'

result = re.findall(r'[\n\r]', string) 
print(result)

# Output: ['\n', '\r']

Expression	String	Matched?
`a{2,3}`	`abc dat`	No match
	`abc daat`	1 match (at `daat`)
	`aabc daaat`	2 matches (at `aabc` and `daaat`)
	`aabc daaaat`	2 matches (at `aabc` and `daaaat`)

Expression	String	Matched?
`[0-9]{2,4}`	`ab123csde`	1 match (match at `ab123csde`)
	`12 and 345673`	3 matches (`12`, `3456`, `73`)
	`1 and 2`	No match

Expression	String	Matched?
`a\|b`	`cde`	No match
	`ade`	1 match (match at `ade`)
	`acdbea`	3 matches (at `acdbea`)

Expression	String	Matched?
`(a\|b\|c)xz`	`ab xz`	No match
	`abxz`	1 match (match at `abxz`)
	`axz cabxz`	2 matches (at `axzbc cabxz`)

Expression	String	Matched?
`\d`	`12abc3`	3 matches (at `12abc3`)
`\d`	`Python`	No match

Expression	String	Matched?
`[abc]`	`a`	1 match
	`ac`	2 matches
	`Hey Jude`	No match
	`abc de ca`	5 matches

Expression	String	Matched?
`..`	`a`	No match
	`ac`	1 match
	`acd`	1 match
	`acde`	2 matches (contains 4 characters)

Expression	String	Matched?
`^a`	`a`	1 match
	`abc`	1 match
	`bac`	No match
`^ab`	`abc`	1 match
`^ab`	`acb`	No match (starts with `a` but not followed by `b`)

Expression	String	Matched?
`ma*n`	`mn`	1 match
	`man`	1 match
	`maaan`	1 match
	`main`	No match (`a` is not followed by `n`)
	`woman`	1 match

Expression	String	Matched?
`ma+n`	`mn`	No match (no `a` character)
	`man`	1 match
	`maaan`	1 match
	`main`	No match (a is not followed by n)
	`woman`	1 match

Expression	String	Matched?
`ma?n`	`mn`	1 match
	`man`	1 match
	`maaan`	No match (more than one `a` character)
	`main`	No match (a is not followed by n)
	`woman`	1 match

Expression	String	Matched?
`\bfoo`	`football`	Match
	`a football`	Match
	`afootball`	No match
`foo\b`	`the foo`	Match
	`the afoo test`	Match
	`the afootest`	No match

Expression	String	Matched?
`Python\Z`	`I like Python`	1 match
	`I like Python Programming`	No match
	`Python is fun.`	No match

Expression	String	Matched?
`\w`	`12&": ;c`	3 matches (at `12&": ;c`)
`\w`	`%"> !`	No match

在本教程中，您将学习正则表达式(RegEx)，并使用Python的re模块与RegEx一起使用(在示例的帮助下)。

使用正则表达式指定模式

元字符

re.findall()

示例1：re.findall()

re.split()

示例2：re.split()

re.sub()

示例3：re.sub()

re.subn()

示例4：re.subn()

研究()

示例5：re.search()

匹配对象

match.group()

示例6：匹配对象

match.start()，match.end()和match.span()

match.re和match。 字符串

在RegEx之前使用r前缀

示例7：使用r前缀的原始字符串

match.re和match。字符串