查找没有折扣的发票为零 Pandas Dataframe

问题描述

在 Pandas 中，我有一个这样的数据框：

| Division |  Invoice |   Transactions | Amount |
|----------|----------|----------------|--------|
|   Europe | 10000000 | Product Charge |   1000 |
|   Europe | 10001000 | Product Charge |   1000 |
|   Europe | 10001000 |       discount |   -500 |
|    Latam | 10002000 | Product Charge |      0 |
|    Latam | 10003000 | Product Charge |   1000 |
|    Latam | 10003000 |       discount |  -1000 |
|   Europe | 10004000 | Product Charge |    500 |
|   Europe | 10004000 |       discount |   -500 |
|   Europe | 10005000 | Product Charge |    500 |
|   Europe | 10005000 |       discount |    495 |
|    Latam | 10006000 | Product Charge |      0 |
|    Latam | 10007000 | Product Charge |      0 |
|    Latam | 10007000 |  Loyalty bonus |    200 |

当发票金额 = 0 和 discount = 0 时，我需要构建一个新的 DF 来计算每个 division 的总和和计数，如下所示：

部门	发票	总计	Q_Invoice
拉丁美洲	10002000	0	1
拉丁美洲	10006000	0	1

在 sql 中我可以按如下方式计算它，但在 Pandas DF 中我无法复制它：

SELECT Division,Invoice,SUM (Amount) Total,COUNT (disTINCT Invoice) Q_Invoice
FROM df
GROUP BY Division,Invoice
HAVING SUM (CASE WHEN Transactions =  'discount' THEN 1 ELSE 0 END) = '0'
       AND SUM (CASE WHEN Transactions = 'Product Charge' THEN 1 ELSE 0 END) >= '1'
       AND SUM(Amount) = 0

我尝试使用 pandassql 在 jupyter notebook 中复制上述结果，但它不起作用，请使用以下计算：

import pandasql as ps
import pandas as pd

 df2 = ps.sqldf ("""SELECT Division,COUNT (disTINCT Invoice) Q_Invoice 
                   FROM df
                   GROUP BY Division,Invoice
                   HAVING SUM (CASE WHEN Transactions =  'discount' THEN 1 ELSE 0 END) = '0'
                   AND SUM (CASE WHEN Transactions = 'Product Charge' THEN 1 ELSE 0 END) >= '1'
                   AND SUM(Amount) = 0 """)

我不知道如何继续，我是熊猫的新手

解决方法

我希望我已经正确理解了您的问题。您可以.pivot_table datafame，然后按产品费用总和过滤：

x = df.pivot_table(
    index=["Division","Invoice"],columns="Transactions",values="Amount",aggfunc=["sum","count"],fill_value=0,)
x = x[x[("sum","Product Charge")].eq(0)].reset_index()
x.columns = x.columns.map("_".join)
x = x.rename(
    columns={
        "Division_": "Division","Invoice_": "Invoice","sum_Product Charge": "Total","count_Product Charge": "Q_Invoice",}
)[["Division","Invoice","Total","Q_Invoice"]]
print(x)

打印：

  Division   Invoice  Total  Q_Invoice
0    Latam  10002000      0          1
1    Latam  10006000      0          1

dataframe having having having pandas pandas pandas-groupby sql sql