尽管给出了20个项目的清单,Python statistics.median返回“ StatisticsError:空数据没有中位数”

问题描述

我是经济学专业的三年级学生,最近攻读了计算机科学的辅修课程。可以说,我对编码了解甚少,但是我目前正在学习数据结构课程,并且在编码我的一项作业的过程中遇到了这个问题:

def median_expense(transactions):
    """Return the median value of transaction amounts."""
    transvalues = []
    for item in transactions:
        transvalues.append(item[1])
    return statistics.median(transvalues)


def significant_transactions(transactions,n_trailing=10):
    """Return a list of significant transactions.

    A transaction is significant if the amount is greater than or equal to
    five times of the median spending for a trailing number of transactions
    """
    sigtrans = []
    trail = 0
    for item in transactions:
        if item in transactions[0:n_trailing-1]:
            pass
        else:
            print(transactions[(trail-n_trailing):trail-1])
            med = median_expense(transactions[(trail-n_trailing):trail-1])
            if item[1] >= 5 * med:
                sigtrans.append(item)
        trail += 1
    return sigtrans

transactions是一个由命名元组“ Transaction”的多次迭代组成的列表,其定义为Transaction = namedtuple(“ Transaction”,[“ time”,“ amount”,“ company”,“ phone”]) 。这些是从文本文件提取的,该文本文件由另一个功能处理成列表事务。

在单独的验证器函数调用
def test_significant_transactions():
    """Testing significant_transactions"""
    module = import_file("fraud.py")
    transactions = module.load_transactions("transactions.txt")
    returned = module.late_night_transactions(transactions)
    Transaction = module.Transaction

    expected = [
        Transaction(
            time="2019-11-09 19:35:55",amount=181.75,company="White-Carr",phone="+1-683-988-9471x923",),Transaction(
            time="2020-06-29 04:31:39",amount=47.73,company="Moore-Oliver",phone="+1-956-998-4999x4202",Transaction(
            time="2021-08-30 02:30:08",amount=150.32,company="Kiss Kiss Nyrt.",phone="+36 49 013-1271",]

    returned = module.significant_transactions(transactions,20)
    assert set(expected) == set(returned)
    assert len(module.significant_transactions(transactions,5)) == 34
    assert len(module.significant_transactions(transactions,10)) == 10
    assert len(module.significant_transactions(transactions,15)) == 9

我遇到的问题是,在ificant_transactions函数中,statistics.median()总是返回一个StatisticsError:空数据没有中位数。我试过只将我传递给它的列表打印成一行,然后打印出20个元组的完整列表。我不明白我的代码的哪一部分导致此列表在传递给statistics.median()之前就消失了。

文本文件中的一些元组示例,因为原始文件中有数百个元组

2019-07-10 00:53:16 | $ 18.84 | Mccarty Inc | + 1-656-321-9087

2019-07-10 10:45:35 | $ 53.19 |米勒,泰勒和布伦南| + 1-133-495-8787x11296

2019-07-11 14:47:00 | $ 28.88 |托马斯·奥乔亚| + 1-127-502-6419

2019-07-12 00:06:10 | $ 5.43 |冈萨雷斯,佩里和马丁内斯| + 1-207-627-7386x43758

2019-07-13 17:02:56 | $ 12.39 | FazekasMártonKht。 | +36 24 197-2587

2019-07-14 22:11:02 | $ 1.51 |马歇尔,里德和德克尔| + 1-865-728-7544

2019-07-15 07:04:02 | $ 36.71 | Garcia-Ho | + 1-213-595-4661x89568

2019-07-16 10:25:37 | $ 19.85 |马丁公司| + 1-370-678-8277x7188

2019-07-16 18:19:43 | $ 1.93 | Glass,Oconnor和Harris | + 1-550-792-2702x310

2019-07-17 02:01:20 | $ 4.19 | Dalton-Robinson | + 1-053-420-4309x78603

2019-07-18 05:23:29 | $ 59.89 |斯坦集团| + 1-097-265-7703

解决方法

我认为您的问题在于循环中的逻辑。我不确定您要达到的目标,但希望我的评论可以使您深入了解问题所在:

这是您的代码:

def significant_transactions(transactions,n_trailing=10):
    """Return a list of significant transactions.

    A transaction is significant if the amount is greater than or equal to
    five times of the median spending for a trailing number of transactions
    """
    sigtrans = []
    trail = 0
    for item in transactions:
        if item in transactions[0:n_trailing-1]:
            pass
        else:
            print(transactions[(trail-n_trailing):trail-1])
            med = median_expense(transactions[(trail-n_trailing):trail-1])
            if item[1] >= 5 * med:
                sigtrans.append(item)
        trail += 1
    return sigtrans

现在,在第一个if if item in transactions[0:n_trailing-1]中,您对第一个n_trailing元素什么也不做,对吗?您是否知道pass只是“无所事事”?对于前n_trailing个元素,您只需递增trail。那是你想要的吗?

如果要跳过前trail += 1个元素的循环(包括n_trailing),则要使用continue

这部分代码与此等效:

trail = n_trailing
for item in transactions[n_trailing:]:
    print(transactions[(trail-n_trailing):trail-1])
    med = median_expense(transactions[(trail-n_trailing):trail-1])
    if item[1] >= 5 * med:
        sigtrans.append(item)
    trail += 1

无论如何,您的错误说StatisticsError: no median for empty data,这使我们认为调用mid_expense的问题是错误的:median_expense(transactions[(trail-n_trailing):trail-1])

我认为错误是,由于transactions不太大,您无法获得(trail - n_trailing)trail - 1之间的项目,因为没有任何项目,因此它会尝试计算空数据的中位数。

在包含20个项目的数组中,用n_trailing = 5调用您的代码,您的代码将:

  1. 为前四个项目trail增加trail = 4
  2. 打印transactions[(4 - 5): 4-1] = transactions[-1:3]->返回一个空列表。
  3. median_expense上崩溃,因为列表传递为空。

再次,我不确定您想要实现什么,但是这段代码是否接近您想要的?

def significant_transactions(transactions,n_trailing=10):
    """Return a list of significant transactions.

    A transaction is significant if the amount is greater than or equal to
    five times of the median spending for a trailing number of transactions
    """
    sigtrans = []
    trail = n_trailing
    for item in transactions[n_trailing:]:
        print(transactions[(trail - n_trailing):trail-1])
        med = median_expense(transactions[(trail-n_trailing):trail-1])
        if item[1] >= 5 * med:
            sigtrans.append(item)
        trail += 1
    return sigtrans