正则表达式

正则表达式中不可以随意添加空格

import re

匹配通常是原始字符串
通配符:. 匹配除了换行符的任何字符
\. 代表 . 字符
\d 匹配数字]
| 逻辑或
() ()中的字符当作一个整体
元字符:拥有其他功能的字符
字符类:r'[aeiou]',匹配其中任何一个字符,匹配一次,[a-z]:表示范围
匹配次数:
r'ab{3}c' b会匹配三次
r'ab{3-10}c' 给定匹配次数的范围
例:匹配IPv4地址
re.search('(([01]\d\d|2[0-4][0-5]|25[0-5]|\d\d|\d)\.){3}([01]\d\d|2[0-4][0-5]|25[0-5]|\d\d|\d)', '192.168.2.5')

字符匹配:

元字符:
. ^ * + ? {} [] \ | ()
[]:
在[]中依然是元字符:^ - \
[^5]#匹配除了5之外的所有字符
[$]#不代表元字符$
\:
\+元字符:去掉特殊功能
\+普通字符:实现特殊功能
\1 表示引用前边成功匹配的序号为 1 的子组 ,反向引用
\八进制代码,表示匹配八进制数代表的字符
re.DOTALL 标志 :使.匹配所有字符,包括换行符
re.ASCII 标志 :使\w匹配只能匹配ASCII
*:
用于指定*前一个字符匹配0次或多次
+:
用于指定前一个字符匹配一次或多次
?:
用于指定前一个字符匹配0次或一次
{m, n}:
用于指定前一个字符必须匹配m次到n次之间,默认0/∞
在表示重复的字符后:?表示非贪婪模式,不会尽可能匹配多的字符

零宽断言:

有些元字符它们不匹配任何字符,只是简单地表示成功或失败
\b 表示当前位置位于单词边界
| 或操作符,对两个正则表达式进行或操作
使用 \| 来匹配 '|' 字符本身;或者包含在一个字符类中,[|]
^ 匹配字符串的起始位置。如果设置了MULTILINE标志,就会变成匹配每一行的起始位置。在 MULTILINE中,每当遇到换行符就会立刻进行匹配
$ 匹配字符串的结束位置,每当遇到换行符也会离开进行匹配
使用 \$ 来匹配 ' 字符本身;或者包含在一个字符类中,[$] \"],[\"DIV\",{},\"等等等等\"]],[\"DIV\",{\"data-btype\":\"list\"},[\"OL\",{},[\"LI\",{},\"Python 的字符串跟正则表达式在有些字符上是有冲突的 (转义字符),尽量使用原始字符串r'......'\"],[\"LI\",{},\"字符类中,断言不是元字符\"]]],[\"DIV\",{\"data-btype\":\"basic\"},\"反向引用:\\\\N:使用序号的方式访问子组\"],[\"DIV\",{\"data-btype\":\"line\"},[\"H4\",{},\"非捕获组和命名组:\"]],[\"DIV\",{\"data-btype\":\"basic\"},\"非捕获组: 添加一个非捕获组并不会影响到其他(捕获)组的序号 \"],[\"DIV\",{\"data-btype\":\"code\",\"data-lang\":\"python\"},[\"DIV\",{},\"(?:……)\"],[\"DIV\",{},\">>> m = re.match(\\\"([abc])+\\\", \\\"abc\\\")\"],[\"DIV\",{},\">>> m.groups()\"],[\"DIV\",{},\"('c',)\"],[\"DIV\",{},\">>> m = re.match(\\\"(?:[abc])+\\\", \\\"abc\\\")\"],[\"DIV\",{},\">>> m.groups()\"],[\"DIV\",{},\"()\"]],[\"DIV\",{\"data-btype\":\"basic\"},\"命名组: 命名组除了有一个名字标识之外,跟其他捕获组是一样的\"],[\"DIV\",{\"data-btype\":\"basic\"},\"普通子组我们使用序列来访问它们,命名组则可以使用一个有意义的名字来进行访问 \"],[\"DIV\",{\"data-btype\":\"basic\"},\"匹配对象的所有方法不仅可以处理那些由数字引用的捕获组,还可以处理通过字符串引用的命名组。除了使用名字访问,命名组仍然可以使用数字序号进行访问 \"],[\"DIV\",{\"data-btype\":\"code\"},[\"DIV\",{},\"(?P\",[\"NAME\",{},\")\"]],[\"DIV\",{},\">>> p = re.compile(r'(?P\",[\"WORD\",{},\"\\\\b\\\\w+\\\\b)')\"]],[\"DIV\",{},\">>> m = p.search( '(((( Lots of punctuation )))' )\"],[\"DIV\",{},\">>> m.group('word')\"],[\"DIV\",{},\"'Lots'\"],[\"DIV\",{},\">>> m.group(1)\"],[\"DIV\",{},\"'Lots'\"]],[\"DIV\",{\"data-btype\":\"code\"},[\"DIV\",{},\"(?P=name)反向引用\"],[\"DIV\",{},\"(\\\\b\\\\w+)\\\\s+\\\\1可以替换成(?P\",[\"NAME\",{},\"\\\\b\\\\w)\\\\s+(?P=name)\"]]],[\"DIV\",{\"data-btype\":\"line\"},[\"H4\",{},\"另一个零宽断言:\"]],[\"DIV\",{\"data-btype\":\"basic\"},\"前向肯定断言:\"],[\"DIV\",{\"data-btype\":\"code\"},[\"DIV\",{},\"(?=……)# 前向肯定断言。如果当前包含的正则表达式(这里以 ... 表示)在当前位置成功匹配,则代表成功,否则失败。一旦该部分正则表达式被匹配引擎尝试过,就不会继续进行匹配了;剩下的模式在此断言开始的地方继续尝试\"],[\"DIV\",{}]],[\"DIV\",{\"data-btype\":\"basic\"},\"前向否定断言:\"],[\"DIV\",{\"data-btype\":\"code\"},[\"DIV\",{},\"(?!……)# 前向否定断言。这跟前向肯定断言相反(不匹配则表示成功,匹配表示失败)\"],[\"DIV\",{},\"当需要排除一个名字时,可以使用前向否定断言\"],[\"DIV\",{},\"re.search(\\\".*[.](?!bat$|exe$).*$\\\", \\\"fish.baat\\\")#排除bat/exe\"]],[\"DIV\",{\"data-btype\":\"line\"},[\"H4\",{},\"修改字符串:\"]],[\"DIV\",{\"data-btype\":\"code\"},[\"DIV\",{},\"re.split()#在正则表达式匹配的地方进行分割,第二个参数是分割的数量\"],[\"DIV\",{},\"re.sub()#找到所有匹配的子字符串,并替换为新的内容\"],[\"DIV\",{},\"re.subn()#跟sub()相同,但返回新的字符串以及替换的数目\"]]]","pageTree":[{"id":9375,"name":"基础","noteId":5030,"parentId":0,"sort":9375,"updatedAt":1589169796,"children":[]},{"id":9393,"name":"数据类型","noteId":5030,"parentId":0,"sort":9393,"updatedAt":1589172731,"children":[]},{"id":9421,"name":"函数","noteId":5030,"parentId":0,"sort":9421,"updatedAt":1589172734,"children":[]},{"id":9593,"name":"文件","noteId":5030,"parentId":0,"sort":9593,"updatedAt":1589172738,"children":[]},{"id":9612,"name":"import","noteId":5030,"parentId":0,"sort":9612,"updatedAt":1589172795,"children":[]},{"id":9636,"name":"异常处理","noteId":5030,"parentId":0,"sort":9636,"updatedAt":1589172795,"children":[]},{"id":9645,"name":"else","noteId":5030,"parentId":0,"sort":9645,"updatedAt":1589172831,"children":[]},{"id":9646,"name":"图形用户界面入门:EasyGui","noteId":5030,"parentId":0,"sort":9646,"updatedAt":1589172851,"children":[]},{"id":9671,"name":"类","noteId":5030,"parentId":0,"sort":9671,"updatedAt":1589172854,"children":[]},{"id":9720,"name":"算数运算","noteId":5030,"parentId":0,"sort":9720,"updatedAt":1589172859,"children":[]},{"id":9861,"name":"定制容器类型","noteId":5030,"parentId":0,"sort":9861,"updatedAt":1589172876,"children":[]},{"id":9867,"name":"迭代器","noteId":5030,"parentId":0,"sort":9867,"updatedAt":1589172878,"children":[]},{"id":9869,"name":"生成器","noteId":5030,"parentId":0,"sort":9869,"updatedAt":1589172882,"children":[]},{"id":9913,"name":"模块","noteId":5030,"parentId":0,"sort":9913,"updatedAt":1589172885,"children":[]},{"id":9925,"name":"python网络爬虫","noteId":5030,"parentId":0,"sort":9925,"updatedAt":1589172888,"children":[]},{"id":9979,"name":"正则表达式","noteId":5030,"parentId":0,"sort":9979,"updatedAt":1589172896,"children":[]},{"id":10413,"name":"GUI:终极选择TKinter","noteId":5030,"parentId":0,"sort":10413,"updatedAt":1589172915,"children":[]},{"id":10689,"name":"pygame模块","noteId":5030,"parentId":0,"sort":10689,"updatedAt":1589172918,"children":[]}],"user":null}} 字符本身;或者包含在一个字符类中,[$]
等等等等
  1. Python 的字符串跟正则表达式在有些字符上是有冲突的 (转义字符),尽量使用原始字符串r'......'
  2. 字符类中,断言不是元字符
反向引用:\N:使用序号的方式访问子组

非捕获组和命名组:

非捕获组: 添加一个非捕获组并不会影响到其他(捕获)组的序号
(?:……)
>>> m = re.match("([abc])+", "abc")
>>> m.groups()
('c',)
>>> m = re.match("(?:[abc])+", "abc")
>>> m.groups()
()
命名组: 命名组除了有一个名字标识之外,跟其他捕获组是一样的
普通子组我们使用序列来访问它们,命名组则可以使用一个有意义的名字来进行访问
匹配对象的所有方法不仅可以处理那些由数字引用的捕获组,还可以处理通过字符串引用的命名组。除了使用名字访问,命名组仍然可以使用数字序号进行访问
(?P)
>>> p = re.compile(r'(?P\b\w+\b)')
>>> m = p.search( '(((( Lots of punctuation )))' )
>>> m.group('word')
'Lots'
>>> m.group(1)
'Lots'
(?P=name)反向引用
(\b\w+)\s+\1可以替换成(?P\b\w)\s+(?P=name)

另一个零宽断言:

前向肯定断言:
(?=……)# 前向肯定断言。如果当前包含的正则表达式(这里以 ... 表示)在当前位置成功匹配,则代表成功,否则失败。一旦该部分正则表达式被匹配引擎尝试过,就不会继续进行匹配了;剩下的模式在此断言开始的地方继续尝试
前向否定断言:
(?!……)# 前向否定断言。这跟前向肯定断言相反(不匹配则表示成功,匹配表示失败)
当需要排除一个名字时,可以使用前向否定断言
re.search(".*[.](?!bat$|exe$).*$", "fish.baat")#排除bat/exe

修改字符串:

re.split()#在正则表达式匹配的地方进行分割,第二个参数是分割的数量
re.sub()#找到所有匹配的子字符串,并替换为新的内容
re.subn()#跟sub()相同,但返回新的字符串以及替换的数目
字符本身;或者包含在一个字符类中,[$] \"],[\"DIV\",{},\"等等等等\"]],[\"DIV\",{\"data-btype\":\"list\"},[\"OL\",{},[\"LI\",{},\"Python 的字符串跟正则表达式在有些字符上是有冲突的 (转义字符),尽量使用原始字符串r'......'\"],[\"LI\",{},\"字符类中,断言不是元字符\"]]],[\"DIV\",{\"data-btype\":\"basic\"},\"反向引用:\\\\N:使用序号的方式访问子组\"],[\"DIV\",{\"data-btype\":\"line\"},[\"H4\",{},\"非捕获组和命名组:\"]],[\"DIV\",{\"data-btype\":\"basic\"},\"非捕获组: 添加一个非捕获组并不会影响到其他(捕获)组的序号 \"],[\"DIV\",{\"data-btype\":\"code\",\"data-lang\":\"python\"},[\"DIV\",{},\"(?:……)\"],[\"DIV\",{},\">>> m = re.match(\\\"([abc])+\\\", \\\"abc\\\")\"],[\"DIV\",{},\">>> m.groups()\"],[\"DIV\",{},\"('c',)\"],[\"DIV\",{},\">>> m = re.match(\\\"(?:[abc])+\\\", \\\"abc\\\")\"],[\"DIV\",{},\">>> m.groups()\"],[\"DIV\",{},\"()\"]],[\"DIV\",{\"data-btype\":\"basic\"},\"命名组: 命名组除了有一个名字标识之外,跟其他捕获组是一样的\"],[\"DIV\",{\"data-btype\":\"basic\"},\"普通子组我们使用序列来访问它们,命名组则可以使用一个有意义的名字来进行访问 \"],[\"DIV\",{\"data-btype\":\"basic\"},\"匹配对象的所有方法不仅可以处理那些由数字引用的捕获组,还可以处理通过字符串引用的命名组。除了使用名字访问,命名组仍然可以使用数字序号进行访问 \"],[\"DIV\",{\"data-btype\":\"code\"},[\"DIV\",{},\"(?P\",[\"NAME\",{},\")\"]],[\"DIV\",{},\">>> p = re.compile(r'(?P\",[\"WORD\",{},\"\\\\b\\\\w+\\\\b)')\"]],[\"DIV\",{},\">>> m = p.search( '(((( Lots of punctuation )))' )\"],[\"DIV\",{},\">>> m.group('word')\"],[\"DIV\",{},\"'Lots'\"],[\"DIV\",{},\">>> m.group(1)\"],[\"DIV\",{},\"'Lots'\"]],[\"DIV\",{\"data-btype\":\"code\"},[\"DIV\",{},\"(?P=name)反向引用\"],[\"DIV\",{},\"(\\\\b\\\\w+)\\\\s+\\\\1可以替换成(?P\",[\"NAME\",{},\"\\\\b\\\\w)\\\\s+(?P=name)\"]]],[\"DIV\",{\"data-btype\":\"line\"},[\"H4\",{},\"另一个零宽断言:\"]],[\"DIV\",{\"data-btype\":\"basic\"},\"前向肯定断言:\"],[\"DIV\",{\"data-btype\":\"code\"},[\"DIV\",{},\"(?=……)# 前向肯定断言。如果当前包含的正则表达式(这里以 ... 表示)在当前位置成功匹配,则代表成功,否则失败。一旦该部分正则表达式被匹配引擎尝试过,就不会继续进行匹配了;剩下的模式在此断言开始的地方继续尝试\"],[\"DIV\",{}]],[\"DIV\",{\"data-btype\":\"basic\"},\"前向否定断言:\"],[\"DIV\",{\"data-btype\":\"code\"},[\"DIV\",{},\"(?!……)# 前向否定断言。这跟前向肯定断言相反(不匹配则表示成功,匹配表示失败)\"],[\"DIV\",{},\"当需要排除一个名字时,可以使用前向否定断言\"],[\"DIV\",{},\"re.search(\\\".*[.](?!bat$|exe$).*$\\\", \\\"fish.baat\\\")#排除bat/exe\"]],[\"DIV\",{\"data-btype\":\"line\"},[\"H4\",{},\"修改字符串:\"]],[\"DIV\",{\"data-btype\":\"code\"},[\"DIV\",{},\"re.split()#在正则表达式匹配的地方进行分割,第二个参数是分割的数量\"],[\"DIV\",{},\"re.sub()#找到所有匹配的子字符串,并替换为新的内容\"],[\"DIV\",{},\"re.subn()#跟sub()相同,但返回新的字符串以及替换的数目\"]]]","pageTree":[{"id":9375,"name":"基础","noteId":5030,"parentId":0,"sort":9375,"updatedAt":1589169796,"children":[]},{"id":9393,"name":"数据类型","noteId":5030,"parentId":0,"sort":9393,"updatedAt":1589172731,"children":[]},{"id":9421,"name":"函数","noteId":5030,"parentId":0,"sort":9421,"updatedAt":1589172734,"children":[]},{"id":9593,"name":"文件","noteId":5030,"parentId":0,"sort":9593,"updatedAt":1589172738,"children":[]},{"id":9612,"name":"import","noteId":5030,"parentId":0,"sort":9612,"updatedAt":1589172795,"children":[]},{"id":9636,"name":"异常处理","noteId":5030,"parentId":0,"sort":9636,"updatedAt":1589172795,"children":[]},{"id":9645,"name":"else","noteId":5030,"parentId":0,"sort":9645,"updatedAt":1589172831,"children":[]},{"id":9646,"name":"图形用户界面入门:EasyGui","noteId":5030,"parentId":0,"sort":9646,"updatedAt":1589172851,"children":[]},{"id":9671,"name":"类","noteId":5030,"parentId":0,"sort":9671,"updatedAt":1589172854,"children":[]},{"id":9720,"name":"算数运算","noteId":5030,"parentId":0,"sort":9720,"updatedAt":1589172859,"children":[]},{"id":9861,"name":"定制容器类型","noteId":5030,"parentId":0,"sort":9861,"updatedAt":1589172876,"children":[]},{"id":9867,"name":"迭代器","noteId":5030,"parentId":0,"sort":9867,"updatedAt":1589172878,"children":[]},{"id":9869,"name":"生成器","noteId":5030,"parentId":0,"sort":9869,"updatedAt":1589172882,"children":[]},{"id":9913,"name":"模块","noteId":5030,"parentId":0,"sort":9913,"updatedAt":1589172885,"children":[]},{"id":9925,"name":"python网络爬虫","noteId":5030,"parentId":0,"sort":9925,"updatedAt":1589172888,"children":[]},{"id":9979,"name":"正则表达式","noteId":5030,"parentId":0,"sort":9979,"updatedAt":1589172896,"children":[]},{"id":10413,"name":"GUI:终极选择TKinter","noteId":5030,"parentId":0,"sort":10413,"updatedAt":1589172915,"children":[]},{"id":10689,"name":"pygame模块","noteId":5030,"parentId":0,"sort":10689,"updatedAt":1589172918,"children":[]}],"user":null}}