计算包含特定单词的推文在一年内的频率

问题描述

我试图计算一个单词在一年内的推文数量,同时记下每天的推文数量并存储,而不是将其存储在带有“日期”和“频率”的 CSV 文件中。这是我的代码,但我运行一段时间后一直出错。

import pandas as pd
import twint
import nest_asyncio
from datetime import datetime,timedelta


bugun = '2020-01-01'
yarin = '2020-01-02'

df = pd.DataFrame(columns=("Data","Frequency")) 




for i in range(365):
    
    file = open("Test.csv","w")
    file.close()
    
    bugun = (datetime.strptime(bugun,'%Y-%m-%d') + timedelta(days=1)).strftime('%Y-%m-%d')

    yarin =(datetime.strptime(yarin,'%Y-%m-%d') + timedelta(days=1)).strftime('%Y-%m-%d')

    nest_asyncio.apply()
    
    c = twint.Config()
    c.Search = "Chainlink"

    #c.Hide_output=True
    c.Since= bugun
    c.Until= yarin

    c.Store_csv = True
    c.Output = "Test.csv"
    c.Count = True 

    twint.run.Search(c)


    data = pd.read_csv("Test.csv")
    frequency = str(len(data))
    
    #d = {"Data": [bugun],"Frequency": [frequency]}

    #d_f = pd.DataFrame(data=d)
    
    #df = df.append(d_f,ignore_index=True)
    

    df.loc[i] = [bugun] + [frequency]
    df.to_csv (r'C:\Users\serap\Desktop\CRYPTO 100\Chainlink.csv',index = False,header=False)

我得到的错误是这个

  File "C:\Users\serap\Desktop\CRYPTO 100\CODES\Binance_Coin\Binance Coin.py",line 47,in <module>
    data = pd.read_csv("Test.csv")

  File "C:\Users\serap\AppData\Local\Programs\Python\python38\lib\site-packages\pandas\io\parsers.py",line 605,in read_csv
    return _read(filepath_or_buffer,kwds)

  File "C:\Users\serap\AppData\Local\Programs\Python\python38\lib\site-packages\pandas\io\parsers.py",line 457,in _read
    parser = TextFileReader(filepath_or_buffer,**kwds)

  File "C:\Users\serap\AppData\Local\Programs\Python\python38\lib\site-packages\pandas\io\parsers.py",line 814,in __init__
    self._engine = self._make_engine(self.engine)

  File "C:\Users\serap\AppData\Local\Programs\Python\python38\lib\site-packages\pandas\io\parsers.py",line 1045,in _make_engine
    return mapping[engine](self.f,**self.options)  # type: ignore[call-arg]

  File "C:\Users\serap\AppData\Local\Programs\Python\python38\lib\site-packages\pandas\io\parsers.py",line 1893,in __init__
    self._reader = parsers.TextReader(self.handles.handle,**kwds)

  File "pandas\_libs\parsers.pyx",line 521,in pandas._libs.parsers.TextReader.__cinit__

EmptyDataError: No columns to parse from file

谢谢你的帮助:)

解决方法

阅读教程 How to Scrape Tweets from Twitter with Python Twint | by Andika Pratama | Analytics Vidhya | Medium 后,我认为您最好让 Twint 进行迭代:

c = twint.Config()
c.Search = "Chainlink"
c.Since = "2020–01–01"
c.Until = "2021–01–01"
c.Store_csv = True
c.Output = "Test.csv"
c.Count = True 
twint.run.Search(c)

现在您可以遍历 CSV 输出:

data = pd.read_csv("Test.csv")
# ...

直到现在,我还没有找到有关 CSV 输出的详细信息,但是 twint 源代码 (master/twint/storage/write.py (line 58 ff)) 表明,对于 CSV,如果文件已存在,则附加输出。因此,您可能必须先截断它或删除现有文件。一个有效的选项可能是

open(`Test.csv`,'w').close()

...这与您所做的基本相同,但没有引入另一个变量。