python中有没有办法计算带有引号,问号和句号的句子?

问题描述

我一直在寻找这个问题的解决方案。我正在编写一个自定义函数来计算句子的数量。我为这个问题尝试了 {% extends 'base.html' %} {% block title %}Каталог{% endblock %} {% block content %} {% for i in products %} <img src="{ static i.photo.url }"> {{i.title}} {% endfor %} {% endblock %} nltk,但两者都给了我不同的计数。

一个句子的例子是这样的。

安妮说:“你确定吗?怎么可能?你在开玩笑吧?”

NLTK 给了我 --> textstat

['安妮说,“你确定吗?','这怎么可能?','你是 开玩笑吧?"']

一个例子:

Annie 说:“它会像这样工作!你需要去面对你的 朋友。好的!”

NLTK 正在给我 --> count=3

请推荐。预期计数为 1,因为它是一个直接的句子。

解决方法

我写了一个简单的函数来做你想做的事:

def sentences_counter(text: str):

    end_of_sentence = ".?!…"
    # complete with whatever end of a sentence punctuation mark I might have forgotten
    # you might for instance want to add '\n'.

    sentences_count = 0
    sentences = []
    inside_a_quote = False
    
    start_of_sentence = 0
    last_end_of_sentence = -2
    for i,char in enumerate(text):
        
        # quote management,to solve your issue
        if char == '"':
            inside_a_quote = not inside_a_quote
            if not inside_a_quote and text[i-1] in end_of_sentence: # ?
                last_end_of_sentence = i                            # ?
        elif inside_a_quote:
            continue

        # basic management of sentences with the punctuation marks in `end_of_sentence`
        if char in end_of_sentence:
            last_end_of_sentence = i
        elif last_end_of_sentence == i-1:
            sentences.append(text[start_of_sentence:i].strip())
            sentences_count += 1
            start_of_sentence = i
    
    # same as the last block in case there is no end punctuation mark in the text
    last_sentence = text[start_of_sentence:]
    if last_sentence:
        sentences.append(last_sentence.strip())
        sentences_count += 1
    
    return sentences_count,sentences

考虑以下事项:

text = '''Annie said,"Are you sure? How is it possible? you are joking,right?" No,I'm not... I thought you were'''

为了稍微概括一下您的问题,我又添加了 2 个句子,一个带有省略号,最后一个甚至没有任何结束标点符号。现在,如果我执行这个:

sentences_count,sentences = sentences_counter(text)
print(f'{sentences_count} sentences detected.')
print(f'The detected sentences are: {sentences}')

我得到了这个:

3 sentences detected.
The detected sentences are: ['Annie said,right?"',"No,I'm not...",'I thought you were']

我认为它工作得很好。

注意:请考虑我的解决方案的报价管理适用于美式报价,其中句子的结束标点符号可以在报价内。删除我放置标志表情符号 ? 的行以禁用此功能。