问题描述
我有多个具有两列值的csv文件,如下所示:
我使用以下python代码计算R2值并绘制这些数据。
import numpy as np
import pandas as pd
import glob
import matplotlib.pyplot as plt
for filepath in glob.iglob(r'*.csv'):
print(filepath)
df = pd.read_csv(filepath)
x_values = df["LMP"]
y_values = df["LMP_old"]
correlation_matrix = np.corrcoef(x_values,y_values)
correlation_xy = correlation_matrix[0,1]
r_squared = correlation_xy**2
plt.scatter(x_values,y_values)
plt.xlabel('Predicted LMP')
plt.ylabel("Actual LMP")
plt.title(r_squared)
plt.xlim(20000,26000)
plt.ylim(20000,26000)
x = np.linspace(20000,26000)
plt.plot(x,x,linestyle='solid')
plt.grid(True)
plt.savefig(filepath+".png")
print(r_squared)
with open(filepath+".txt","w") as text_file:
print(f"{r_squared}",file=text_file)
但是我发现x_values
和y_values
在每个循环之后都不会重置,但是会记住上一个循环的值并不断累加。需要什么命令才能使x_values
和y_values
在每次循环后独立/复位?
非常感谢您。
解决方法
import numpy as np
import pandas as pd
import glob
import matplotlib.pyplot as plt
for filepath in glob.iglob(r'*.csv'):
print(filepath)
df = pd.read_csv(filepath)
x_values = df["LMP"]
y_values = df["LMP_old"]
correlation_matrix = np.corrcoef(x_values,y_values)
correlation_xy = correlation_matrix[0,1]
r_squared = correlation_xy**2
plt.scatter(x_values,y_values)
plt.xlabel('Predicted LMP')
plt.ylabel("Actual LMP")
plt.title(r_squared)
plt.xlim(20000,26000)
plt.ylim(20000,26000)
x = np.linspace(20000,26000)
plt.plot(x,x,linestyle='solid')
plt.grid(True)
plt.savefig(filepath+".png")
plt.close() # Adding this code solves the issue.
print(r_squared)
with open(filepath+".txt","w") as text_file:
print(f"{r_squared}",file=text_file)