python – 正则表达式捕获2引号之间的部分

当试图在引文之间抓住这句话时,我似乎无法正确使用我的正则表达式.例如.以粗体显示(注意:输入前后有字符串):

“I can quite understand your thinking so.” I said. “Of course, in
your position of unofficial adviser and helper to everybody who is
absolutely puzzled, throughout three continents, you are brought in
contact with all that is strange and bizarre. But here”

“Of course, in your position of unofficial adviser and helper to
everybody who is absolutely puzzled, throughout three continents, you
are brought in contact with all that is strange and bizarre. But
here”–I picked up the morning paper from the ground–“let us put
it to a practical test. Here is the first heading upon which I come.
‘A husband’s cruelty to his wife.’ There is half a column of print,
but I kNow without reading it that it is all perfectly familiar to me.
There is, of course, the other woman, the drink, the push, the blow,
the bruise, the sympathetic sister or landlady. The crudest of writers
Could invent nothing more crude.”

我试图在引号之前和之后得到文本,但我无法达到所需的输出.必须有一些方法可以对正则表达式进行分组,这样我就可以在引号之间以及周围的两个引号之间捕获字符串.

尝试:

import re

def get_quotes(paragraph):
    quote_rx = r'''([""])(?:(?=(\\?))\2.)*?\1'''
    return [i.group(0) for i in \
           re.finditer(quote_rx, paragraph, re.S)]

def get_said(paragraph, quote):
    quote_start = paragraph.index(quote)
    quote_end = quote_start + len(quote)
    before = paragraph[:quote_start]
    after = paragraph[quote_end:]
    return before, after


paragraphs = ['''I smiled and shook my head. "I can quite understand your thinking so." I said. "Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"--I picked up the morning paper from the ground--"let us put it to a practical test. Here is the first heading upon which I come. 'A husband's cruelty to his wife.' There is half a column of print, but I kNow without reading it that it is all perfectly familiar to me. There is, of course, the other woman, the drink, the push, the blow, the bruise, the sympathetic sister or landlady. The crudest of writers Could invent nothing more crude."''', 
'''Such was the remarkable narrative to which I listened on that April evening -- a narrative which would have been utterly incredible to me had it not been confirmed by the actual sight of the tall, spare figure and the keen, eager face, which I had never thought to see again. In some manner he had learned of my own sad bereavement, and his sympathy was shown in his manner rather than in his words. "Work is the best antidote to sorrow, my dear Watson," said he, "and I have a piece of work for us both to-night which, if we can bring it to a successful conclusion, will in itself justify a man's life on this planet." In vain I begged him to tell me more. "You will hear and see enough before morning," he answered. "We have three years of the past to discuss. Let that suffice until half-past nine, when we start upon the notable adventure of the empty house."''']

for p in paragraphs:
    saids = set()
    for i in get_quotes(p):
        b,a = get_said(p,i)
        print b
        print a
        print

期望的输出

in-btw: I said.
quotes: ["I can quite understand your thinking so.","Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"]
section: "I can quite understand your thinking so." **I said.** "Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"


in-btw: --I picked up the morning paper from the ground--
quotes: ['''"Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"''', '''"let us put it to a practical test. Here is the first heading upon which I come. 'A husband's cruelty to his wife.' There is half a column of print, but I kNow without reading it that it is all perfectly familiar to me. There is, of course, the other woman, the drink, the push, the blow, the bruise, the sympathetic sister or landlady. The crudest of writers Could invent nothing more crude."''']
section: "Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"**--I picked up the morning paper from the ground--**"let us put it to a practical test. Here is the first heading upon which I come. 'A husband's cruelty to his wife.' There is half a column of print, but I kNow without reading it that it is all perfectly familiar to me. There is, of course, the other woman, the drink, the push, the blow, the bruise, the sympathetic sister or landlady. The crudest of writers Could invent nothing more crude."

解决方法:

这很简单,你需要的正则表达式是r’^(“[^”]“)([^”])(“[^”]“)’:

import re

s = """
"I can quite understand your thinking so." I said. "Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"

"Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"--I picked up the morning paper from the ground--"let us put it to a practical test. Here is the first heading upon which I come. 'A husband's cruelty to his wife.' There is half a column of print, but I kNow without reading it that it is all perfectly familiar to me. There is, of course, the other woman, the drink, the push, the blow, the bruise, the sympathetic sister or landlady. The crudest of writers Could invent nothing more crude."
"""

for segment in s.splitlines():
    if not segment:
        continue
    first, said, second = re.match(r'^("[^"]+")([^"]+)("[^"]+")', segment).groups()
    print first
    print said
    print second

>>> 
"I can quite understand your thinking so."
 I said. 
"Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"
"Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"
--I picked up the morning paper from the ground--
"let us put it to a practical test. Here is the first heading upon which I come. 'A husband's cruelty to his wife.' There is half a column of print, but I kNow without reading it that it is all perfectly familiar to me. There is, of course, the other woman, the drink, the push, the blow, the bruise, the sympathetic sister or landlady. The crudest of writers Could invent nothing more crude."

相关文章

python方向·数据分析   ·自然语言处理nlp   案例:中...
原文地址http://blog.sina.com.cn/s/blog_574a437f01019poo....
ptb数据集是语言模型学习中应用最广泛的数据集,常用该数据集...
 Newtonsoft.JsonNewtonsoft.Json是.Net平台操作Json的工具...
NLP(NaturalLanguageProcessing)自然语言处理是人工智能的一...
做一个中文文本分类任务,首先要做的是文本的预处理,对文本...