在 Pandas 数据帧上应用函数

问题描述

在 Pandas 数据框上应用函数

我有一个代码 (C01),用于计算证券交易所 (IBOV - B3-BRAZIL) 上给定股票(个股)的移动平均线(21 个周期)。然后我创建了一个 for 循环,它确定资产在 6 个高点和移动平均线之后处于上升趋势(假设,考虑到有更多变量可以确定这一点)。

但是,我想对多个资产执行此循环,在本例中为 C02,也就是说,它在我的代码的每一列中应用一个函数,并仅返回处于上升趋势的资产的名称在这种情况下,列名)。我试图将 for 循环变成一个函数,并使用 Pandas 'apply' 将该函数应用于每一列(axis = 1,我尝试了 tbm axis = 'columns')。但是我在创建函数时出错。当我使用 apply 执行函数时,会出现消息“ValueError: Lengths must match to compare”。我该如何解决这个问题?

感谢关注。

import numpy as np
import pandas as pd
from pandas_datareader import data as wb
from mpl_finance import candlestick_ohlc
from pandas_datareader import data as wb
from datetime import datetime
import matplotlib.dates as mpl_dates
import matplotlib.pyplot as plt
import matplotlib.dates as mdates 

#STOCK
ativo = 'WEGE3.SA'
acao2 = ativo.upper()

#START AND END ANALYSIS
inicio = '2020-1-1'
fim = '2021-1-27'

#MAKE DATAFRAME
df00 = wb.DataReader(acao2,data_source='yahoo',start=inicio,end=fim)

df00.index.names = ['Data']
df= df00.copy(deep=True)
df['Data'] = df.index.map(mdates.date2num)

# MOVING AVERAGE
df['ema21'] = df['Close'].ewm(span=21,adjust=False).mean()
df['ema72'] = df['Close'].ewm(span=72,adjust=False).mean()

#DF PLOT
df1=df
df2=df[-120:]

#TREND RULE
alta=1
for i in range(6):
  if(df2.ema21[-i-1] < df2.ema21[-i-2]):
    alta=0

baixa=1
for i in range(6):
  if(df2.ema21[-i-1] > df2.ema21[-i-2]):
    baixa=0

if (alta==1 and baixa==0):
  a1 = ativo.upper()+ ' HIGH TREND'
elif (alta==0 and baixa==1):
  a1 = ativo.upper()+ ' LOW TREND!'
else:
  a1 = ativo.upper()+ ' UNDEFINED'
  
#PLOT RESULTS
print("---------------------------------------") 
print(a1)
print("---------------------------------------")

ohlc = df[['Data','Open','High','Low','Close']]

f1,ax = plt.subplots(figsize=(14,8))

# plot the candlesticks
candlestick_ohlc(ax,ohlc.values,width=.6,colorup='green',colordown='red')
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))

label_ = acao2.upper() + ' EMA26'
label_2 = acao2.upper() + ' EMA09'
ax.plot(df.index,df1['ema21'],color='black',label=label_)
ax.plot(df.index,df1['ema72'],color='blue',label=label_)

ax.grid(False)
ax.legend()
ax.grid(True)

plt.title(acao2.upper() + ' : Gráfico Diário')
plt.show(block=True)

#C02

#START/END ANALISYS
inicio = '2020-1-1'
fim = '2021-1-27'

#STOCKS
ativos = ['SAPR11.SA','WEGE3.SA']

#DATAFRAME
mydata = pd.DataFrame()
for t in ativos:
    mydata[t] = wb.DataReader(t,end=fim)['Close']
df2 = mydata

#MOVING AVERAGE
df3 = df2.apply(lambda x: x.rolling(window=21).mean())

#MAKE FUNCTION
def trend(x):
  tendencia_alta=1
  for i in range(6):
    if(df3.columns[-i-1:] > df3.columns[-i-2:]):
      tendencia_alta=0

  print()
  if (alta==1 and baixa==0):
      a1 = ativo.upper()+ ' HIGH TREND'
  elif (alta==0 and baixa==1):
      a1 = ativo.upper()+ ' LOW TREND!'
  else:
      a1 = ativo.upper()+ ' UNDEFINED'

#TRYING TO APPLY THE FUNCTION IN EVERY DF3 COLUMN
df3.apply(trend,axis=1)´´´

解决方法

类似:

def myfunc(x):
   #do things here where x is the group of rows sent to function
   #instead of df['column'],you'll use x['column'] 
   #because you are passing the rows into x
   return x

df.groupby('yourcolumn').apply(myfunc)