问题描述
我有一个数据集“banks”,如果我对列名“jobs”进行分组以检查每个类别中的计数,我可以找到以下内容:
索引 | 工作 | 计数 |
---|---|---|
0 | 广告。 | 478 |
1 | 蓝领 | 946 |
2 | 企业家 | 168 |
3 | 女佣 | 112 |
4 | 管理 | 969 |
5 | 退休 | 230 |
6 | 个体经营者 | 183 |
7 | 服务 | 417 |
8 | 学生 | 84 |
9 | 技术员。 | 768 |
我还添加了我正在使用的数据集的前 3 行: 年龄、工作、婚姻、教育、默认、平衡、住房、贷款、联系、天、月、持续时间、活动、pdays、前一个、poutcome、y 30,失业,已婚,小学,没有,1787,没有,没有,蜂窝,19,十月,79,1,-1,0,未知,没有 33,服务,已婚,中学,没有,4789,是,蜂窝,11,5月,220,1,339,4,失败,没有 35,management,single,tertiary,no,1350,yes,16,apr,185,330,failure,no
我的目的是创建一个小函数,我也可以将其用于其他列,因此我尝试使用“dfply”包创建一个函数。
import pandas as pd
import dfply
from dfply import *
#creating the function
@dfpipe
def woe_iv(df,variable):
step1=df>>group_by(X.variable)>>summarize(COUNT=X.variable.count())
return step1
#invoking the function
banks>>woe_iv(X.job)
@dfpipe
def woe_iv(df,variable):
step1=df>>group_by(X.variable)>>summarize(COUNT=X.variable.count())
return step1
banks>>woe_iv(X.job)
Traceback (most recent call last):
File "<ipython-input-46-d851aeac1927>",line 7,in <module>
banks>>woe_iv(X.job)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py",line 142,in __rrshift__
result = self.function(other_copy)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py",line 149,in <lambda>
return pipe(lambda x: self.function(x,*args,**kwargs))
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py",line 329,in __call__
return self.function(*args,**kwargs)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py",line 282,in __call__
return self.function(df,**kwargs)
File "<ipython-input-46-d851aeac1927>",line 5,in woe_iv
step1=df>>group_by(X.variable)>>summarize(COUNT=X.variable.count())
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py",line 279,in __call__
args = self._recursive_arg_eval(df,args[1:])
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py",line 241,in _recursive_arg_eval
return [
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py",line 242,in <listcomp>
self._symbolic_to_label(df,a) if i in eval_as_label
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py",line 231,in _symbolic_to_label
return self._evaluator_loop(df,arg,self._evaluate_label)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py",line 225,in _evaluator_loop
return eval_func(df,arg)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py",line 181,in _evaluate_label
arg = self._evaluate(df,line 175,in _evaluate
arg = arg.evaluate(df)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py",line 71,in evaluate
return self.function(context)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py",line 74,in <lambda>
return Intention(lambda x: getattr(self.function(x),attribute),File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py",line 5139,in __getattr__
return object.__getattribute__(self,name)
AttributeError: 'DataFrame' object has no attribute 'variable'
如果我遗漏了什么,请告诉我。
解决方法
Shameek Mukherjee,这是对示例代码的正确解释和缩进吗?除了缩进,我找不到任何区别。
import dfply
from dfply import *
@dfpipe
def woe_iv(df,variable):
step1 = df>>group_by(X.variable)>>summarize(COUNT=X.variable.count())
return step1
banks>>woe_iv(X.job)
第二个例子:
@dfpipe
def woe_iv(df,variable):
step1 = df>>group_by(X.variable)>>summarize(COUNT=X.variable.count())
return step1
banks>>woe_iv(X.job)
,
感谢提供示例数据。问题的根本原因是您忘记在 woe_iv()
(即 X[variable]
)中的变量周围加上括号,这导致了错误 "AttributeError:" DataFrame "object has no" variable "attribute " "
@dfpipe
def woe_iv(df,variable):
return df >> group_by(X[variable]) >> summarize(COUNT=X[variable].count())
banks = pd.read_excel('banks.xlsx')
>> print(banks >> woe_iv('marital'))
marital COUNT
0 married 2
1 single 1
如果你不喜欢熊猫烟斗,还有另一种形式:
>> banks.groupby(['marital']).size().reset_index(name='COUNT')
marital COUNT
0 married 2
1 single 1
或者如果您熟悉 SQL,请使用 PandaSQL:
SQL_Query = pd.read_sql_query(
'''select product_name,product_price_per_unit,units_ordered,((units_ordered) * (product_price_per_unit)) AS revenue
from tracking_sales''',conn)
示例数据:
>> print(banks)
age job marital education default balance housing loan \
0 30 unemployed married primary no 1787 no no
1 33 services married secondary no 4789 yes yes
2 35 management single tertiary no 1350 yes no
contact day month duration campaign pdays previous poutcome y
0 cellular 19 oct 79 1 -1 0 unknown no
1 cellular 11 may 220 1 339 4 failure no
2 cellular 16 apr 185 1 330 1 failure no