正则表达式详细实例(正则表达式)
正则表达式就是描述字符串排列的一套规则。通常被用来检索、替换那些符合某个模式(规则)的文本。 为什么要学习正则表达式呢,因为我们在爬取数据的时候需要根据我们想要获取的内容来进行爬取,而正则表达式就具有这个基本功能。 在Python中,一般我们会使用re模块来实现Python正则表达式的功能。
re常用函数解析:
练习:
import re
print(re.match("www","www.baidu.com").span())
print(re.match("www","wwwbaidu.com"))
print(re.match("www","ww.baidu.com"))
print(re.match("www","baidu.wwwcom"))
print(re.match("www","wWw.wwwcom",flags=re.I))
#扫描字符串,返回从头起手位置成功的匹配
复制
练习:
import re
print(re.search("sunck","good man is sunck!sunck is nice"))
print(re.search("sunck","good man is Sunck!sunck is nice"))
print(re.search("sunck","good man is Sunck!sunck is nice",flags=re.I))
复制
import re
print(re.findall("sunck","good man is sunck!Sunck is nice",flags=re.I))
复制
正则表达式元字符: 正则表达式中: 在字符串前加上 r 这个前缀来避免部分疑惑,因为 r 开头的 python字符串是 raw 字符串,所以里面的所有字符都不会被转义 。 一般正则表达式使用反斜杆(\)来转义特殊字符,使其可以匹配字符本身,而不是指定其他特殊的含义。
练习:
import re
print(re.search(".","sunck is a good man 7"))
print(re.findall("[^0-9a-zA-Z_]","sunck is a good man 7"))
print(re.search("\d","sunck is a good man 7"))
print(re.search("[^\d]","sunck is a good man 7"))
print(re.findall("[\W]","sunck is a good man 7 "))
print(re.findall("[\n]","sunck is a good man 7 \n"))
print(re.findall(".","sunck is a good man 7 \n",flags=re.S))
复制
锚字符:
练习:
import re
print(re.search("^sunck","sunck is good man"))
print(re.search("man$","sunck is good man"))
#re.M 多行匹配
print(re.findall("^sunck","sunck is good man\nsunck is good man",re.M))
print(re.search("\Asunck","sunck is good man\nsunck is good man",re.M))
print(re.findall("man$","sunck is good man\nsunck is good man",re.M))
print(re.search("man\Z","sunck is good man\nsunck is good man",re.M))
复制
匹配多个字符
练习:
import re
print(re.findall(r"a?","aaa")) #非贪婪匹配
print(re.findall(r"a*","aaa")) #贪婪匹配
print(re.findall(r".*","aaabaa")) #贪婪匹配
print(re.findall(r"a ","aabaaaa")) #贪婪匹配
print(re.findall(r"a ","aba"))
print(re.findall(r"a{3}","aaaaabaaaaaa"))
复制
特殊:
import re
print(re.findall(r"//*.*/*/", "/* part1 */ /* part */"))
//* 后面的/是转义特殊字符
print(re.findall(r"//*.*?/*/", "/* part1 */ /* part */"))
复制
re模块深入:
str3="sunck is a good man!sunck is a nice man!sunck is a handsome man"
d=re.finditer(r"(sunck)",str3)
while True:
try:
i=next(d)
print(d)
except StopIteration as e:
break
复制
字符串的替换和修改:
str5="sunck is a good good good man!"
print(re.sub(r"(good)","nice",str5))
print(type(re.sub(r"(good)","nice",str5,count=2)))
print(re.subn(r"(good)","nice",str5))
print(type(re.subn(r"(good)","nice",str5)))
复制
分组:
str6="010-53247654"
m=re.match(r"(\d{3})-(\d{8})",str6)
#使用序号获取对应组的信息,group(0)代表原始字符串
print(m.group(0))
print(m.group(1))
print(m.group(2))
print(m.groups())
m1=re.match(r"((\d{3})-(\d{8}))",str6)
print(m1.group(0))
print(m1.group(1))
print(m1.group(2))
print(m1.group(3))
print(m1.groups())
m2=re.match(r"(?P\d{3})-(?P(\d{8}))",str6)
print(m2.group("zz"))
print(m2.group("ss"))
复制
编译:
compile(pattern,flags=0)
pattern:要编译的正则表达式
pat=r"^1(([3578]\d)|(47))\d{8}$"
print(re.match(pat,"13600000000"))
#编译成正则对象
re_telephon=re.compile(pat)
print(re_telephon.match("13600000000"))
#re模块调用 re.match(pattern,string,flags=0)
#re对象调用 re_telephon.match(string)
#re模块调用 re.finditer(pattern,string,flags=0)
#re对象调用 re_telephon.finditer(string)
#re模块调用re.split(pattern,string=,maxsplit=,flags=0)
#re对象调用re_telephon.split(string=,maxsplit=)
#re.sub(pattern=,repl=,string=,count=,flags=)
#re_telephon.sub(repl=,string=,count=)
复制
正则表达式实例练习:
总结: 在学完了正则表达式之后,我们需要多加练习。 有了一定的掌握就可以进入实战状态了。
,免责声明:本文仅代表文章作者的个人观点,与本站无关。其原创性、真实性以及文中陈述文字和内容未经本站证实,对本文以及其中全部或者部分内容文字的真实性、完整性和原创性本站不作任何保证或承诺,请读者仅作参考,并自行核实相关内容。文章投诉邮箱:anhduc.ph@yahoo.com