如何在文本文件中提取带引号的语句-Python

问题描述

我想在文本文件中找到所有引用的语句。我写了一个代码，它可以找到第一个引用的语句。但是，当我使用 while循环时，它可以遍历整个文本并找到它们，但都无效。这是代码：

        quoteStart = fullText.index("\"")
        quoteEnd = fullText.index("\"",quoteStart + 1)
        quotedText = fullText[quoteStart:quoteEnd+1]
        print ("{}:{}".format(quoteStart,quoteEnd))
        print (quotedText)

输出：

250:338

"When we talk about the Hiroshima and Nagasaki bombing,we never talk about Shinkolobwe,"

我如何添加while循环以遍历整个文本？

解决方法

我认为您的问题是 quoteStart = fullText.index("\"")将始终在文本的开头。

尝试以下方法：

quoteEnd = -1

while True:
    try:
        quoteStart = fullText.index("\"",quoteEnd+1)
        quoteEnd = fullText.index("\"",quoteStart + 1)
    except ValueError:
        break
        
    quotedText = fullText[quoteStart:quoteEnd+1]
    print ("{}:{}".format(quoteStart,quoteEnd))
    print (quotedText)

提供一个最小的工作示例总是很好的，即，如果您提供了fullText中内容的示例，则将更容易回答这个问题。

您不需要进行while循环。正则表达式将是一个更简单的解决方案。

我们假设， fullText = '"When we talk about the Hiroshima and Nagasaki bombing,we never talk about Shinkolobwe," was what one said and "I agree." was what another said.'

您可以使用如下所示的正则表达式。

import re

quotedText = re.findall(r'"([^"]*)"',fullText)

print(quotedText)

结果：

['When we talk about the Hiroshima and Nagasaki bombing,','I agree.']

r'"([^"]*)"'是一个原始字符串，表示正则表达式，以匹配出现在任何数量的任何事物上，除了两个双引号之间的双引号之外。

here是一个很好的解释。

python quotes while-loop