下载 SEC 数据时出现递归错误

问题描述

我目前正在尝试使用 sec_edgar_downloader 库从 SEC EDGAR 下载 S-1 文件我有一个由 CIK 值组成的 Pandas DataFrame,对于每个值,我想在可用时下载相关的 S-1。为了检查哪些公司没有它,我添加一个新列,当找到并下载文件时该列等于 1,否则为 0。我运行的代码

df["s1"] = df["s1"].apply(lambda x : tryconvert(CIK_check(x)))

其中 tryconvert() 是一个定义为的函数

def tryconvert(x):
    try:
        CIK_check(x)
    except RecursionError:
        return "0"

和 CIK_check() 是一个定义为

函数
def CIK_check(x):
    time.sleep(0.3)
    if dl.get("S-1",x) == 1:
        return "1"
    else:
        return "0"

CIK_check 在文件可用时执行下载文件并返回表示下载是否成功的二进制值的操作。我不得不添加 tryconvert() 以尝试解决最终在尝试运行代码时出现的错误,其中引发以下错误

RecursionError                            Traceback (most recent call last)
<ipython-input-243-a8a327555f29> in <module>
----> 1 df["s1"] = df["s1"].apply(lambda x : tryconvert(CIK_check(x)))

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in apply(self,func,convert_dtype,args,**kwds)
   3846             else:
   3847                 values = self.astype(object).values
-> 3848                 mapped = lib.map_infer(values,f,convert=convert_dtype)
   3849 
   3850         if len(mapped) and isinstance(mapped[0],Series):

pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

<ipython-input-243-a8a327555f29> in <lambda>(x)
----> 1 df["s1"] = df["s1"].apply(lambda x : tryconvert(CIK_check(x)))

<ipython-input-241-62c62b553142> in CIK_check(x)
      1 def CIK_check(x):
      2     time.sleep(0.3)
----> 3     if dl.get("S-1",x) == 1:
      4         return "1"
      5     else:

~/opt/anaconda3/lib/python3.8/site-packages/sec_edgar_downloader/Downloader.py in get(self,filing,ticker_or_cik,amount,after,before,include_amends,download_details,query)
    167         )
    168 
--> 169         download_filings(
    170             self.download_folder,171             ticker_or_cik,~/opt/anaconda3/lib/python3.8/site-packages/sec_edgar_downloader/_utils.py in download_filings(download_folder,filing_type,filings_to_fetch,include_filing_details)
    261         if include_filing_details:
    262             try:
--> 263                 download_and_save_filing(
    264                     download_folder,265                     ticker_or_cik,~/opt/anaconda3/lib/python3.8/site-packages/sec_edgar_downloader/_utils.py in download_and_save_filing(download_folder,accession_number,download_url,save_filename,resolve_urls)
    218     if resolve_urls and Path(save_filename).suffix == ".html":
    219         base_url = f"{download_url.rsplit('/',1)[0]}/"
--> 220         filing_text = resolve_relative_urls_in_filing(filing_text,base_url)
    221 
    222     # Create all parent directories as needed and write content to file

~/opt/anaconda3/lib/python3.8/site-packages/sec_edgar_downloader/_utils.py in resolve_relative_urls_in_filing(filing_text,base_url)
    198         return soup
    199 
--> 200     return soup.encode(soup.original_encoding)
    201 
    202 

~/opt/anaconda3/lib/python3.8/site-packages/bs4/element.py in encode(self,encoding,indent_level,formatter,errors)
   1526         # Turn the data structure into Unicode,then encode the
   1527         # Unicode.
-> 1528         u = self.decode(indent_level,formatter)
   1529         return u.encode(encoding,errors)
   1530 

~/opt/anaconda3/lib/python3.8/site-packages/bs4/__init__.py in decode(self,pretty_print,eventual_encoding,formatter)
    742         else:
    743             indent_level = 0
--> 744         return prefix + super(BeautifulSoup,self).decode(
    745             indent_level,formatter)
    746 

~/opt/anaconda3/lib/python3.8/site-packages/bs4/element.py in decode(self,formatter)
   1596         else:
   1597             indent_contents = None
-> 1598         contents = self.decode_contents(
   1599             indent_contents,formatter
   1600         )

~/opt/anaconda3/lib/python3.8/site-packages/bs4/element.py in decode_contents(self,formatter)
   1690                 text = c.output_ready(formatter)
   1691             elif isinstance(c,Tag):
-> 1692                 s.append(c.decode(indent_level,1693                                   formatter))
   1694             preserve_whitespace = (

... last 2 frames repeated,from the frame below ...

~/opt/anaconda3/lib/python3.8/site-packages/bs4/element.py in decode(self,formatter
   1600         )

RecursionError: maximum recursion depth exceeded

但是,这不起作用,因为我仍然收到此错误,这使得我无法完成我尝试执行的任务。错误的原因可能是什么? (不幸的是,鉴于它是 Pandas DataFrame 上的应用函数,不清楚在哪个条目引发错误)。有没有其他方法可以克服 RecursionError 而不必停止计算并简单地将其视为标记为 0 的失败下载?

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)