使用Python从HTML / JS画布中提取图像转换

问题描述

我正在使用Python为我所在的游戏社区制作一个不和谐的机器人,现在我正在执行一个返回游戏状态的命令(使用this网站)。该命令的完整代码here

现在仅显示以下内容

enter image description here

但是我也想添加您可以在the website上找到的图形,我使用BeautifulSoup来获取其他值,并且获取图像也很容易。 但是图形不是图像,它是JavaScript / HTML中使用的canvas对象。我不知道数学的工作原理,但是我可以通过右键单击然后复制图像来非常轻松地将其“本地”转换为图像。

我的问题是:如何在我的Python代码中将该画布对象作为图像检索?

当我用Google搜索这个问题时,我大多会得到Tkinter的结果,但实际上没有什么帮助。

解决方法

我不确定您是否要这样做。由于绘图数据是从服务器端预先注入到html中的,因此获取数据的唯一方法是解析脚本并将值转换为Python数据类型(最好是Pandas数据框,以便于绘制)。

我编写了以下凌乱的代码,可能会对您有所帮助。我已经使用PyJSParser来解析脚本。并从中获取变量的值。

我在代码中留下了一些注释。请阅读。

from bs4 import BeautifulSoup as bs
import matplotlib.pyplot as plt
import json
from pyjsparser import parse
import pandas as pd
import matplotlib.pyplot as plt

def parseScript(scriptContent):
    res = parse(scriptContent)

    df = pd.DataFrame(columns=['timestamp','report'])

    # This part is very tricky
    # Since the parsing tree is multiple layer deep 
    # And there is no guarantee that the server won't change the order we have 
    # to consider traversing all of it to make sure if it is infact what we want. 
    # comment out the print statements to see what I mean by multi level deep.
    # its a rabbit hole. 

    for obj in res['body']:
        if obj['type'] == 'VariableDeclaration':
            for declaration in obj['declarations']:
                if declaration['type'] == 'VariableDeclarator':
                    if declaration['id']['name'] == 'data':
                        # print(declaration.keys())
                        # print(declaration['type'])
                        # print(declaration['id'])
                        # print(declaration['init'].keys())
                        # print(declaration['init']['type'])
                        # print(type(declaration['init']['properties']))
                        for subVar in declaration['init']['properties']:
                            # print(subVar.keys())
                            # print(subVar['type'])
                            # print(subVar['key'])
                            if subVar['key']['name'] == 'series':
                                # print(len(subVar['value']))
                                # print(type(subVar['value']))
                                # print(subVar['value'].keys())
                                # print(len(subVar['value']['elements']))

                                for element in subVar['value']['elements']:
                                    # print(type(element))
                                    # print(element.keys())
                                    # print(element['properties'][0].keys())
                                    timestamp = element['properties'][0]['value']['value']
                                    report = element['properties'][1]['value']['value']
                                    df.loc[len(df)] = [timestamp,report]
    return df

def scraper(soup):
    # first we must filter the div in which the chart's script reside
    # so we don't mistakenly take any other script from the page
    chartDiv = soup.find_all('div',attrs={'id': 'chart-row'})
    print(len(chartDiv))
    
    scriptContent = chartDiv[0].find_all('script')[0].string

    reportData = parseScript(scriptContent)
    return reportData

def plotData(df,duration=24):
    '''
    @param df dataframe gotten from scraped web pages script
    @param duration duration in HOUR of which data to plot
    '''
    import datetime as dt
    import pytz
    # pre process a bit
    # convert timestamp frame into datetime object
    df['timstamp'] = pd.to_datetime(df['timestamp'])

    # the timezone is fixed from the source
    timeZone = pytz.FixedOffset(-240)
    df = df[df['timstamp'] >= (dt.datetime.now(timeZone) - dt.timedelta(hours=duration))]

    times = pd.to_datetime(df['timestamp'])
    df = df.groupby([times.dt.hour])['report'].sum()
    
    df.plot(x = 'timestamp',y = 'report')
    plt.show()

if __name__ == '__main__':
    with open('lala.html','rb') as file:
        soup = bs(file,'html5lib')

    data = scraper(soup)
    plotData(data)

安装以下库

  • html5lib(用于更好的html解析)
  • 熊猫
  • matplotlib
  • pyjsparser

现在对于图形部分,我认为美化取决于您。通过方法scraper(),您可以获取可用于绘制图形的数据框。

我非常简单地绘制了图表,这可能与您所喜好无关。试试看。