python中有没有办法计算带有引号，问号和句号的句子？

问题描述

我一直在寻找这个问题的解决方案。我正在编写一个自定义函数来计算句子的数量。我为这个问题尝试了 {% extends 'base.html' %} {% block title %}Каталог{% endblock %} {% block content %} {% for i in products %} <img src="{ static i.photo.url }"> {{i.title}} {% endfor %} {% endblock %} 和 nltk，但两者都给了我不同的计数。

一个句子的例子是这样的。

安妮说：“你确定吗？怎么可能？你在开玩笑吧？”

NLTK 给了我 --> textstat。

['安妮说，“你确定吗？'，'这怎么可能？'，'你是开玩笑吧？"']

另一个例子：

Annie 说：“它会像这样工作！你需要去面对你的朋友。好的！”

NLTK 正在给我 --> count=3。

请推荐。预期计数为 1，因为它是一个直接的句子。

解决方法

我写了一个简单的函数来做你想做的事：

def sentences_counter(text: str):

    end_of_sentence = ".?!…"
    # complete with whatever end of a sentence punctuation mark I might have forgotten
    # you might for instance want to add '\n'.

    sentences_count = 0
    sentences = []
    inside_a_quote = False
    
    start_of_sentence = 0
    last_end_of_sentence = -2
    for i,char in enumerate(text):
        
        # quote management,to solve your issue
        if char == '"':
            inside_a_quote = not inside_a_quote
            if not inside_a_quote and text[i-1] in end_of_sentence: # ?
                last_end_of_sentence = i                            # ?
        elif inside_a_quote:
            continue

        # basic management of sentences with the punctuation marks in `end_of_sentence`
        if char in end_of_sentence:
            last_end_of_sentence = i
        elif last_end_of_sentence == i-1:
            sentences.append(text[start_of_sentence:i].strip())
            sentences_count += 1
            start_of_sentence = i
    
    # same as the last block in case there is no end punctuation mark in the text
    last_sentence = text[start_of_sentence:]
    if last_sentence:
        sentences.append(last_sentence.strip())
        sentences_count += 1
    
    return sentences_count,sentences

考虑以下事项：

text = '''Annie said,"Are you sure? How is it possible? you are joking,right?" No,I'm not... I thought you were'''

为了稍微概括一下您的问题，我又添加了 2 个句子，一个带有省略号，最后一个甚至没有任何结束标点符号。现在，如果我执行这个：

sentences_count,sentences = sentences_counter(text)
print(f'{sentences_count} sentences detected.')
print(f'The detected sentences are: {sentences}')

我得到了这个：

3 sentences detected.
The detected sentences are: ['Annie said,right?"',"No,I'm not...",'I thought you were']

我认为它工作得很好。

注意：请考虑我的解决方案的报价管理适用于美式报价，其中句子的结束标点符号可以在报价内。删除我放置标志表情符号 ? 的行以禁用此功能。

python quotation-marks string string