向量化:如何避免两个for循环?

问题描述

通过这篇文章,我正在寻找输入向量化我的python代码,该代码当前正在使用两个for循环。由于性能原因,我想避免使用for循环。我当前使用的python代码如下所示。

代码做什么? 我有一个输入数据帧,其列c1具有4行10和3行20。 c2列是另一列具有一些随机数的列。

预期的输出:我的窗口大小为2。因此,对于每2行c1 = 10或c1 = 20的行,我必须计算相应列c2的均值。我随附了输入和预期的输出屏幕截图。

目前,我正在使用两个for循环来实现这一目标。

输入数据框屏幕截图:input dataframe 预期的输出屏幕截图:expected output

我当前的Python代码:

import pandas as pd
data = [{'c1':10,'c2':10},{'c1':10,'c2':20},'c2':30},'c2':40},{'c1':20,'c2':50},'c2':60},'c2':70}]
df = pd.DataFrame(data) # df = Input
df.head()
 
window = 2
allDF = pd.DataFrame()
records = df['c1'].unique()

for x in records:
    intervalsDF = pd.DataFrame(columns=['c1','meanc2'])
    df2 = df.loc[df['c1'] == x]
    for i in range(0,len(df2),window):
        intervalIndex = len(intervalsDF)
        interval = df2[i:i+window]
        c1 = list(interval['c1'])[0]
        meanc2 = interval['c2'].mean()
        intervalSummary = [c1,meanc2]
        intervalsDF.loc[intervalIndex] = intervalSummary
    allDF = allDF.append(intervalsDF) # allDF is the expected output

allDF.head()

解决方法

执行转换可能是更短,更简单的方法。但这是一种避免循环的方法。

# create the data frame,as per the original post
data = [{'c1':10,'c2':10},{'c1':10,'c2':20},'c2':30},'c2':40},{'c1':20,'c2':50},'c2':60},'c2':70}
]
df = pd.DataFrame(data) # df = Input

# 1. convert the index to an ordinary column
df = df.reset_index()

# 2. 'helper' is a column that counts 0,1,2,3,... 
#     and re-starts for each c1
df['helper'] = df['index'] - df.groupby('c1')['index'].transform(min)

# 3. integer division on 'helper',to get 0,... 
# (identify non-overlapping pairs)
df['helper'] //= 2

# 4. now convert 'index' from ordinary column back to an Index
df = df.set_index('index')

# 5. compute the mean of c2 for value of 'c1' and each pair of observations
df = df.groupby(['c1','helper'])['c2'].mean()

# 6. re-order 'helper' and 'c1' to match order in output
df.index = df.index.swaplevel()

print(df)

helper  c1
0       10    15
1       10    35
0       20    55
1       20    70
Name: c2,dtype: int64

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...