问题描述
我是 Python 新手,不了解所有方面。
我想遍历 dataframe
(2D) 并将其中一些值分配给 xarray
(3D)。
我的 xarray 的坐标是公司股票代码 (1)、财务变量 (2) 和每日日期 (3)。
每家公司的 dataframe
列是一些与 xarray
中相同的财务变量,索引由季度日期组成。
我的目标是为每个公司取一个已经生成的 dataframe
并在某个变量的列和某个日期的行中查找一个值,并将其分配给 xarray
中的相应位置{1}}。
由于某些日期不会出现在 dataframe
的索引中(每个日历年只有 4 个日期),我想为 xarray
或xarray
上一个日期的值,如果该值也不为 0。
我曾尝试使用嵌套的 for 循环来完成此操作,但仅在一个变量中遍历所有日期大约需要 20 秒。
我的日期列表如果由大约 8000 个日期组成,变量列表有大约 30 个变量,公司列表大约有 800 个公司。
如果我要循环所有这些,我将需要几天时间才能完成嵌套的 for 循环。
有没有更快的方法将这些值分配给 xarray
?我的猜测类似于 iterrows()
或 iteritems()
,但在 xarray
中。
这是我的程序的示例代码,其中包含公司和变量的较短列表:
import pandas as pd
from datetime import datetime,date,timedelta
import numpy as np
import xarray as xr
import time
start_time = time.time()
# We create the df. This is aun auxiliary made-up df. Its a shorter version of the real df.
# The real df I want to use is much larger and comes from an external method.
cols = ['cashAndCashEquivalents','shortTermInvestments','cashAndShortTermInvestments','totalAssets','totalLiabilities','totalStockholdersEquity','netIncome','freeCashFlow']
rows = []
for year in range(1989,2020):
for month,day in zip([3,6,9,12],[31,30,31]):
rows.append(date(year,month,day))
a = np.random.randint(100,size=(len(rows),len(cols)))
df = pd.DataFrame(data=a,columns=cols)
df.insert(column='date',value=rows,loc=0)
# This is just to set the date format so that I can later look up the values
for item,i in zip(df.iloc[:,0],range(len(df.iloc[:,0]))):
df.iloc[i,0] = datetime.strptime(str(item),'%Y-%m-%d')
df.set_index('date',inplace=True)
# Coordinates for the xarray:
companies = ['AAPL'] # This is actually longer (around 800 companies),but for the sake of the question,it is limited to just one company.
variables = ['totalAssets','totalStockholdersEquity'] # Same as with the companies (around 30 variables).
first_date = date(1998,3,25)
last_date = date.today() + timedelta(-300)
dates = pd.date_range(start=first_date,end=last_date).tolist()
# We create a zero xarray,so that we can later fill it up with values:
z = np.zeros((len(companies),len(variables),len(dates)))
ds = xr.DataArray(z,coords=[companies,variables,dates],dims=['companies','variables','dates'])
# We assign values from the df to the ds
for company in companies:
for variable in variables:
first_value_found = False
for date in dates:
# Dates in the df are quarterly dates and dates in the ds are daily dates.
# We start off by looking for a certain date in the df. If we dont find it,we give it the value 0 in the ds
# If we do find it,we assign it the value found in the df and tell it that the first value has been found
# Now that the first value has been found,when we dont find a value in the df,instead of giving it a value of 0,we give it the value of the last date.
if first_value_found == False:
try:
ds.loc[company,variable,date] = df.loc[date,variable]
first_value_found = True
except:
ds.loc[company,date] = 0
else:
try:
ds.loc[company,variable]
except:
ds.loc[company,date] = ds.loc[company,date + timedelta(-1)]
print("My program took",time.time() - start_time,"to run")
主要问题在于 for 循环,因为我已经在单独的文件上测试过这些循环,而且这些似乎是最耗时的。
解决方法
一种可能的策略是遍历 DataFrame 的实际索引,而不是所有可能的索引
add_filter( 'woocommerce_gateway_title','change_payment_gateway_title',100,2 );
function change_payment_gateway_title( $title,$payment_id ){
$targeted_payment_id = 'redsys_gw'; // Set your payment method ID
$targeted_product_ids = array(37,53); // Set your product Ids
// Only on checkout page for specific payment method Id
if( is_checkout() && ! is_wc_endpoint_url() && $payment_id === $targeted_payment_id ) {
// Loop through cart items
foreach( WC()->cart->get_cart() as $item ) {
// Check for specific products: Change payment method title
if( in_array( $item['product_id'],$targeted_product_ids ) ) {
return __("Payment in installments","woocommerce");
}
}
}
return $title;
}
这应该已经减少了相当多的迭代次数。你仍然需要确保所有的空白都被填满,所以你会做一些像
avail_dates = df.index
for date in avail_dates:
# Copy the data
没错,您可以使用列表对 DataArray 和 DataFrame 进行索引。 (另外我不会使用 da.loc[company,variables,date:] = df.loc[date,variables]
作为来自 ds
的东西的变量名而不是 xarray
)
不过,您可能想要使用的是 pandas.DataFrame.reindex()。
如果我明白你想要做什么,这或多或少应该可以解决问题(未经测试)
DataSet