问题描述
我的目标:从 yahoo_fin 收集期权链数据并编译每个“合约名称”行的数据,以便我可以计算 14 天的高点和低点。
我的计划:每天使用循环创建一个选项链数据框,为每个“合同名称”行导出一个 csv,以便我可以编译数据点,然后使用“...滚动(窗口=14).max()" 命令来计算高点和 .min() 的低点。
我的问题:我能够创建每个数据框,但不知道如何根据“合同名称”行将每一行导出到 csv 并且需要将数据添加到以前的数据点(不覆盖)。我也不确定这是否是解决此问题的最佳方法,因此请提供更好的解决方案。
我的代码如下:
我正在使用循环获取期权链数据并为变量中定义的股票行情列表创建数据框
import pandas as pd
from yahoo_fin import options
listshare = ('GOOGL','NFLX')
length_1 = len(listshare)
i = 0
while i < length_1:
print(listshare[i] + " is uploading data")
locals()[str(listshare[i])+"_df"] = options.get_calls(listshare[i])
i += 1
然后我使用以下代码计算中点(标记)。标记是我需要计算的高点和低点:
listshare = ('GOOGL','NFLX')
df_list = (GOOGL_df,NFLX_df)
length_2 = len(df_list)
i = 0
while i < length_2:
##Generate variables
df_list[i]['Ticker'] = listshare[i]
today = date.today()
bid = df_list[i]['Bid']
ask = df_list[i]['Ask']
df_list[i]['Mark'] = ask - ((ask-bid)/2)
mark = df_list[i]['Mark']
df_list[i]['Date'] = today
i += 1
这是数据框的样子
Ticker Date Contract Name Strike Mark Bid Ask % Change Open Interest Implied Volatility
0 NFLX 2021-04-28 NFLX210430C00270000 270.0 236.750 235.15 238.35 - 0 284.77%
1 NFLX 2021-04-28 NFLX210430C00300000 300.0 206.750 205.15 208.35 - 2 240.63%
2 NFLX 2021-04-28 NFLX210430C00380000 380.0 126.750 125.15 128.35 - 3 140.23%
3 NFLX 2021-04-28 NFLX210430C00385000 385.0 121.750 120.15 123.35 -15.39% 2 134.57%
4 NFLX 2021-04-28 NFLX210430C00400000 400.0 106.875 105.40 108.35 - 30 125.49%
感谢您的洞察力。
更新: 我已经想出了如何为每个“合同名称”创建单独的 CSV 文件并使用下面的代码附加它们。我目前的问题是我无法访问之前的“标记”列数据来生成新列来标识给定时期内的最低价格或每个时期的价格变化。我在网上读到这可能是由于 CSV 是逗号分隔值而发生的,并且尝试了 .astype(float) 但没有成功。感谢您的洞察力!
from yahoo_fin import options
from datetime import date
import pandas as pd
import os
import csv
today = date.today()
listshare = ('GOOGL','NFLX')
length_1 = len(listshare)
i = 0
# Iterate using a loop
while i < length_1:
exp_dates = options.get_expiration_dates(listshare[i])
info = {}
for date in exp_dates:
print(str(listshare[i] + ' calls: ' + date))
info[date] = options.get_calls(listshare[i],date)
info[date][['Date']] = today
info[date][['Expiration']] = date
info[date].set_index('Date',inplace = True)
#variables
info[date]['Ticker'] = listshare[i]
info[date]['Bid'] = info[date]['Bid'].astype(float)
info[date]['Ask'] = info[date]['Ask'].astype(float)
bid = info[date]['Bid'].astype(float)
ask = info[date]['Ask'].astype(float)
info[date]['Mark'] = ask - ((ask-bid)/2)
info[date][['Mark']] = info[date]['Mark'].astype(float)
mark = info[date][['Mark']]
info[date][['Period Low']] = mark
relevant_1 = info[date][info[date]["Bid"] > 0.04]
relevant = relevant_1[relevant_1['Strike'] %10 == 0]
rel = relevant[['Ticker','Contract Name','Expiration','Strike','Bid','Mark','Ask','Period Low']]
#print(rel)
groupby = rel.groupby('Contract Name')
for n,g in groupby:
csv_filename = "{}.csv".format(n)
csv = g.to_csv(index=True)
#print(csv_filename) - "Contract Name.csv"
#print(g) - "Data points (data frame) grouped by Contract name"
#print(n) - "Contract Name"
#check if file exist,if so append,if not create
if os.path.exists(csv_filename):
#open file and append current day's options chain
with open(csv_filename,'a') as csvfile:
print('opening & Appending: ' + str(csv_filename))
g.to_csv(csv_filename,mode='a',header=False)
csvfile.close()
print('Done.')
#convert csv to df
df = pd.read_csv(csv_filename)
#define variables
info[date]['Ticker'] = listshare[i]
low_period = 14
df[['Mark']] = df['Mark'].astype(float)
mark = df[['Mark']].astype(float)
#append dataframe (add current day's data & add 14d low column)
print('Converting ' + str(csv_filename) + ' to data frame.')
df[['Mark']] = df[['Mark']].astype(float)
mark = df[['Mark']].astype(float)
df[['Period Low']] = mark.rolling(window = low_period).min()
df[['Period Low']] = df[['Period Low']].astype(float)
period_low = df[['Period Low']].astype(float)
df[['Period High']] = mark.rolling(window = low_period).max()
df[['Period High']] = df[['Period High']].astype(float)
period_high = df[['Period High']].astype(float)
df[['PrevIoUs Low']] = df[['Period Low']].shift(+1)
prevIoUs_low = df[['PrevIoUs Low']]
df[['Mark Delta %']] = ((mark - (mark.shift(+1)))/mark.shift(+1))
df[['Mark Delta %']] = df[['Mark Delta %']].astype(float)
mark_delta = df[['Mark Delta %']].astype(float)
df.set_index('Date',inplace = True)
#overwrite existing csv from dataframe
print('Updating ' + str(csv_filename) + ' from data frame.')
with open(csv_filename,'w') as out_file:
df.to_csv(csv_filename)
print(str(csv_filename) + ' is complete.')
else:
#Create new csv file
print('Creating: ' + str(csv_filename))
g.to_csv(csv_filename)
i += 1
附注。我还需要一个过程来覆盖具有相同日期值的行。
解决方法
请稍加保留,但我花了一些时间对此并有一些观察/建议。我的代码如下(基于我认为你需要的)。您必须验证最终结果是否符合您的要求,因为您的某些代码难以遵循。
虽然我保留了您创建单个文件的逻辑,但我认为这是非常多的开销,而不是在有意义的情况下使用单个文件/数据帧。当您想要分析或查找信号/警报时,像这样翻阅文件非常耗时。
您正在使用的模块输出数据帧,您应该只使用这些。最简单的方法是读取到期日期,然后将它们连接起来,然后对它们进行操作。你会看到我的代码是这样做的。
不需要while循环,只需使用列表和for语句。您将在我的代码中看到我按切片限制元素数量(用于测试)。你可以删除那些。
没有必要定义/重新定义系列。这可能只是个人喜好。我删除了所有这些。
你在这里关于浮点数的绊脚石不是 csv 或解析问题;一旦你深入研究,你就会看到出价和询价没有值的地方,它们有一个破折号 (-)。一旦您替换它并更改为浮动,您就不需要搞乱所有这些转换。
无论如何,你有一个很好的流程,这应该收紧一点
today = date.today() + timedelta(days=0) # can can change to negative days to get data from previous days,zzero is today
listshare = ['GOOGL','NFLX']
for symbol in listshare:
exp_dates = options.get_expiration_dates(symbol)
df_hold_list = []
for expdate in exp_dates:
# print(str(symbol + ' calls: ' + expdate))
df = options.get_calls(symbol,expdate)
df['Date'] = today
df['Expiration'] = expdate
df['Ticker'] = symbol
df_hold_list.append(df)
dff = pd.concat(df_hold_list)
# do calculations and reduce dataframe
dff[['Bid','Ask']] = dff[['Bid','Ask']].replace('-','0.0')
dff['Bid'] = dff['Bid'].astype(float)
dff = dff[dff["Bid"] > 0.04]
dff = dff[dff['Strike'] %10 == 0]
dff['Mark'] = dff['Ask'] - ((dff['Ask']-dff['Bid'])/2)
dff['Period Low'] = dff['Mark']
dff.set_index('Date',inplace=True)
# use copy() as sometimes pandas will throw errors about modifying a slice of a dataframe
dff = dff[['Ticker','Contract Name','Expiration','Strike','Bid','Mark','Ask','Period Low']].copy()
print(dff)
groupby = dff.groupby('Contract Name')
for n,g in list(groupby)[0:2]:
csv_filename = "{}.csv".format(n)
#check if file exist,if so append,if not create
if os.path.exists(csv_filename):
#open file and append current day's options chain
with open(csv_filename,'a') as csvfile:
print('Opening & Appending: ' + str(csv_filename))
g.to_csv(csv_filename,mode='a',header=False)
csvfile.close()
print('Done.')
#convert csv to df
df = pd.read_csv(csv_filename)
#define variables
low_period = 14
df['Previous Low'] = df['Period Low'].shift(+1)
# check these calcuations??? not sure i have them correct
df['Period Low'] = df['Mark'].rolling(window = low_period).min()
df['Period High'] = df['Mark'].rolling(window = low_period).max()
df[['Mark Delta %']] = ((df['Mark'] - (df['Mark'].shift(+1)))/df['Mark'].shift(+1))
df.set_index('Date',inplace = True)
#overwrite existing csv from dataframe
print('Updating ' + str(csv_filename) + ' from data frame.')
with open(csv_filename,'w') as out_file:
df.to_csv(csv_filename)
print(str(csv_filename) + ' is complete.')
else:
#Create new csv file
print('Creating: ' + str(csv_filename))
g.to_csv(csv_filename)