使用Python中的正则表达式选择所有排列而无需重复

问题描述

我有三类字符，例如字母[A-Za-z]，数字[0-9]和符号[!@#$]。出于争论的目的，特定的符号并不重要。我想在Python中使用正则表达式，以便我可以选择这三个类的所有排列，长度为3，而无需重复。

例如，以下将成功匹配：

a1!
4B_
*x7

以下内容将失败：

ab!
BBB
*x_
a1!B

在没有在正则表达式中明确写出类的每个潜在排列的情况下，我该怎么办？

我以前尝试过以下解决方案：

import re
regex = r"""
              ([A-Za-z]|[0-9]|[!@#$])
    (?!\1)    ([A-Za-z]|[0-9]|[!@#$])
    (?![\1\2])([A-Za-z]|[0-9]|[!@#$])
    """
s = "ab1"
re.fullmatch(regex,s,re.VERBOSE)

但是字符串ab1的匹配不正确。这是因为组引用\1和\2是指组中实际匹配的内容，而不是包含在组内的正则表达式组。

然后，如何引用先前匹配组中包含的正则表达式，而不是其实际内容？

解决方法

您的主要问题是，您不能使用反向引用来否定模式的一部分，只能使用它们来匹配/否定与在相应捕获组中捕获的相同值。

注意[^\1]匹配\x01字符以外的任何字符，而不匹配组1所容纳的任何字符，因为在字符类内部，反向引用不再如此。 ab1被匹配，因为b不等于a并且1不等于a和1。

您可以使用的是一系列否定先行，它们会在某些条件下“排除”匹配，例如字符串不能包含两位数字/字母/特殊字符。

rx = re.compile(r"""
  (?!(?:[\W\d_]*[^\W\d_]){2})      # no two letters allowed
  (?!(?:\D*\d){2})                 # no two digits allowed
  (?!(?:[^_!@\#$*]*[_!@\#$*]){2})  # no two special chars allowed
  [\w!@\#$*]{3}                    # three allowed chars
""",re.ASCII | re.VERBOSE)

请参见regex demo。在演示中，被否定的字符类将替换为.*，因为测试是针对单个多行文本而不是单独的字符串进行的。

请参见Python demo：

import re
passes = ['a1!','4B_','*x7']
fails = ['ab!','BBB','*x_','a1!B']
rx = re.compile(r"""
  (?!(?:[\W\d_]*[^\W\d_]){2})      # no two letters allowed
  (?!(?:\D*\d){2})                 # no two digits allowed
  (?!(?:[^_!@\#$*]*[_!@\#$*]){2})  # no two special chars allowed
  [\w!@\#$*]{3}                    # three allowed chars
""",re.ASCII | re.VERBOSE)
for s in passes:
    print(s,' should pass,result:',bool(rx.fullmatch(s)))
for s in fails:
    print(s,' should fail,reuslt:',bool(rx.fullmatch(s)))

输出：

a1!  should pass,result: True
4B_  should pass,result: True
*x7  should pass,result: True
ab!  should fail,reuslt: False
BBB  should fail,reuslt: False
*x_  should fail,reuslt: False
a1!B  should fail,reuslt: False

一个简单的解决方案是不要自己写出排列，而是让Python在itertools的帮助下完成排列。

from itertools import permutations

patterns = [
    '[a-zA-Z]','[0-9]','[!@#$]'
]

regex = '|'.join(
    ''.join(p)
    for p in permutations(patterns)
)

combinatorics python regex