re模块与subprocess模块介绍

一：re模块处理正则表达式的模块，正则表达式就是一些带有特殊含义的符号或者符号的组合。

作用：对字符串进行过滤，在一堆字符串中找到你所关心的内容，你就需要告诉计算机你的过滤的

规则是什么样的，通过什么方式来告诉计算机，就通过正则表达式。

正则表达式的各种符号所表示的含义（注：re模块的内部实现不是python,而是调用了c库）

举例说明：

import re

src=‘abc_defa 34_h\na‘

print(re.findall(‘a‘,src)) # [‘a‘,‘a‘,‘a‘]

print(re.findall(‘\w‘,src)) # \w匹配字母数字下划线 [‘a‘,‘b‘,‘c‘,‘_‘,‘d‘,‘e‘,‘f‘,‘3‘,‘4‘,‘h‘,‘a‘]

print(re.findall(‘\W‘,src)) # \W匹配非字母数字下划线 [‘ ‘,‘\n‘]

print(re.findall(‘\s‘,src)) # \s匹配所有不可见字符 [‘ ‘,‘\n‘]

print(re.findall(‘\S‘,src)) # \S匹配所有可见字符 [‘a‘,‘a‘]

print(re.findall(‘\d‘,src)) #\d匹配任意数字，等价于[0-9] [‘3‘,‘4‘]

print(re.findall(‘\D‘,src)) #\D匹配任意非数字 [‘a‘,‘ ‘,‘\n‘,‘a‘]

print(re.findall(‘\n‘,src)) #\n只匹配换行符 [‘\n‘]

print(re.findall(‘.‘,src)) # . 除了\n以外的任意字符 [‘a‘,‘a‘]

像\s \w \d都是匹配单个字符，如果匹配重复字符

* + ？ {}

例：

print(re.findall(‘\d*‘,‘1 12 aa bb‘)) # * 前面的表达式出现0次或任意次 [‘1‘,‘‘,‘12‘,‘‘]

print(re.findall(‘\d+‘,‘1 12 333 aa bb‘)) # + 重复1次或多次 [‘1‘,‘333‘]

print(re.findall(‘\d?‘,‘aa bb a1c 1c1‘)) # ？表示重复0次或1次 [‘‘,‘1‘,‘‘]

print(re.findall(‘\d{1,3}‘,‘aa bb a1c 1c11‘)) # {m,n}手动指定重复次数，最少m次，最多n次 [‘1‘,‘11‘]

print(re.findall(‘[a-z]{3}‘,‘a aa aaa aaaa‘)) #{m} 必须是m次 [‘aaa‘,‘aaa‘]

print(re.findall(‘[a-z]{,‘a aa aaa aaaa‘)) #{，m}最大m次 [‘a‘,‘aa‘,‘aaa‘,‘‘]

匹配范围 | []

例：

print(re.findall(‘1|0|2‘,‘1982asds‘)) # a|b 匹配a或b [‘1‘,‘2‘]

print(re.findall(‘[012]‘,‘1982asds‘)) #[abc] 用来表示一组字符，单独列出，[abc] 匹配‘a’ ‘b‘或‘c’ [‘1‘,‘2‘]

例子：找出所有的数字和字母（注：减号只有在两个字符中间才有范围的意思，在两边都是普通字符）

print(re.findall(‘[0-9a-zA-Z]‘,‘1982asds‘)) #[‘1‘,‘9‘,‘8‘,‘2‘,‘s‘,‘s‘]

匹配行首，在范围匹配时使用^字符可以表示取反 ^

例：print(re.findall(‘^h‘,‘hellohelloh‘)) # ^ 行首 [‘h‘]

print(re.findall(‘[^0-9a-zA-Z]‘,‘1982asds‘)) # ^ 取反 []

匹配行尾：$ 需要写在表达式后面

例： print(re.findall(‘oh$‘,‘eololheullooh‘)) # $ 行尾 [‘oh‘]

单词边界：\b 也就是指单词和空格间的位置（也就是单词末尾）

例： print(re.findall(r‘h\b‘,‘hello world hih okh‘)) # [‘h‘,‘h‘]

小练习：

1：验证密码是否符合规则：不能少于8位，只能是数字字母下划线，最长16位

import re

pwd1=‘123456789‘

pwd2=‘1234‘

print(re.findall(‘\w{8,16}‘,pwd)) #pwd1 [‘12345678‘] , pwd2 []

2:验证手机号码：长度11，全是数字，前三位固定范围[189 131 150]

import re

phone=‘13162996258‘

print(re.findall(‘1(?:89|31|50)\d{8}‘,phone))

贪婪匹配 * + 不是固定的特殊符号，只是一种现象

例： print(re.findall(‘\w‘,‘ajshskhkcd‘)) #[‘a‘,‘j‘,‘k‘,‘d‘]

print(re.findall(‘\w+‘,‘ajshskhkcd‘)) #[‘ajshskhkcd‘]

print(re.findall(‘\w*‘,‘ajshskhkcd‘)) #[‘ajshskhkcd‘,‘‘]

会一直匹配到不满足条件为止，用？来阻止贪婪匹配（匹配最少满足条件的字符数）

print(re.findall(‘\w+?‘,‘ajshskhkcd‘)) #[‘a‘,‘d‘]
print(re.findall(‘\w*?‘,‘ajshskhkcd‘)) # [‘‘,‘‘]

什么时候需要阻止贪婪匹配

例：src="<img src=‘www.baidupic.shuai1.jpg‘><img src=‘www.baidupic.shuai2.jpg‘><img src=‘www.baidupic.shuai3.jpg‘>"

正则表达式取出图片地址

print(re.findall("src=‘.+?‘",src))

#["src=‘www.baidupic.shuai1.jpg‘","src=‘www.baidupic.shuai2.jpg‘","src=‘www.baidupic.shuai3.jpg‘"]
print(re.findall("src=‘(.+?)‘",src))

#[‘www.baidupic.shuai1.jpg‘,‘www.baidupic.shuai2.jpg‘,‘www.baidupic.shuai3.jpg‘]
print(re.findall("src=‘(?:.+?)‘",src))

#["src=‘www.baidupic.shuai1.jpg‘","src=‘www.baidupic.shuai3.jpg‘"]

（）用于给正则表达式分组(group)，不会改变原来的表达式逻辑意义，效果：优先取出括号内的内容

re模块的常用方法：

1：findall 从左往右查找所有满足条件的字符,返回一个列表

2: search 返回第一个匹配的字符串,结果封装为对象

span=(0,5) 匹配的位置 match匹配的值

例：

print(re.search(‘hello‘,‘hello world hello ython‘)) #<re.Match object; span=(0,5),match=‘hello‘>

print(re.search(‘hello‘,‘hello world hello ython‘).group()) # hello

3:match 匹配行首,返回值与search相同

例：print(re.match(‘hello‘,match=‘hello‘>

print(re.match(‘hello‘,‘hello world hello ython‘).group()) # hello

对于search,match 匹配的结果通过group来获取

4:split 分割

例：print(re.split(‘hello‘,‘world hello ython‘,maxsplit=0)) # [‘world ‘,‘ ython‘]

5：compile 将正则表达式封装为一个正则对象,好处是可以重复使用这个表达式

例：pattern=re.compile(‘hello‘)

print(pattern.search(‘hello world hello ython‘)) #<re.Match object; span=(0,match=‘hello‘>

6：sub 替换

例：print(re.sub(‘hello‘,‘hao‘,‘world hello ython‘)) #world hao ython

小练习：现有如下字符串，用正则表达式将c和shell换位置

src=‘c|java|ython|shell‘

#先分3组

#再替换

============================================================

二：subprocess模块

sub 子

process 进程

什么是进程：正在进行中的程序，每当打开一个程序就会开启一个进程，每个进程包含运行程序所需的所有

资源，正常情况下不可以跨进程访问数据，但是有些情况就需要访问别的进程数据，就提供一

个叫做管道的对象，专门用于跨进程通讯。

作用：用于执行系统命令

常用方法：

1：run 返回一个表示执行结果的对象

2: call 返回的执行的状态码

3: Popen 返回的也是对象 ① stdout

②stderr

③stdin

总结：subprocess的好处是可以获取指令的执行结果

subprocess执行指令时可以在子进程中执行，这样避免造成主进程卡死

re模块与subprocess模块介绍

相关文章