将上传的文本文件转换为Python Dash中的DataFrame

问题描述

我正在探索破折号来构建用于日志分析的仪表板。我在Jupyter Notebook中进行了分析,但在Dash中却很难做到这一点。经过数小时的研究,我无法弄清楚如何在Dash中将文本文件转换为DataFrame。

  • 我需要在Dash中执行以下操作。
def ReadLogFile(LogFile):
    with open(LogFile) as f:
        Log = f.readlines()
    Log = [x.strip() for x in Log]
    return Log

def Profiles_Submitted(LogFile):
    """
    This method searches for the pattern "Submitting Task For Execution" to list all the profiles
    submitted in the log file
    """
    filterLines = ReadLogFile(LogFile)
    OutputLines = []
    for line in filterLines:
        if re.search('Submitting Task For Execution',line):
            OutputLines.append(line)
    return OutputLines

..... Bunch of Functions in-between ....


def Profiles_Submitted_Clean_DataFrame(LogFile):
    """
    The DataFrame processes source datatype to required datatypes
    """
    process_df = Profiles_Submitted_Raw_DataFrame(LogFile)
    process_df ['Profile Job Date'] = pd.to_datetime(process_df ['Profile Job Date'],format='%Y-%m-%d').astype(str)
    process_df['Profile Job Time'] = pd.to_datetime(process_df['Profile Job Time']).dt.time
    return (process_df)

我在Dash文档中找到了一些示例,可以在其中导入xlsx或csv文件并将其转换为Data表,但我发现它们在文本文件的上下文中没有帮助。

app.layout = html.Div([
    dcc.Upload(
        id='upload-data',children=html.Div([
            'Drag and Drop or ',html.A('Select Files')
        ]),style={
            'width': '100%','height': '60px','lineHeight': '60px','borderWidth': '1px','borderStyle': 'dashed','borderRadius': '5px','textAlign': 'center','margin': '10px'
        },# Allow multiple files to be uploaded
        multiple=True
    ),html.Div(id='output-data-upload'),])


def parse_contents(contents,filename,date):
    content_type,content_string = contents.split(',')

    decoded = base64.b64decode(content_string)
    
    try:
        if 'csv' in filename:
            # Assume that the user uploaded a CSV file
            
            df = pd.read_csv(
                io.StringIO(decoded.decode('utf-8')))
        elif 'xls' in filename:
            # Assume that the user uploaded an excel file
            df = pd.read_excel(io.BytesIO(decoded))
    
    except Exception as e:
        print(e)
        return html.Div([
            'There was an error processing this file.'
        ])

    return html.Div([
        html.H5(filename),html.H6(datetime.datetime.fromtimestamp(date)),dash_table.DataTable(
            data=df.to_dict('records'),columns=[{'name': i,'id': i} for i in df.columns]
        ),html.Hr(),# horizontal line

        # For debugging,display the raw contents provided by the web browser
        html.Div('Raw Content'),html.Pre(contents[0:200] + '...',style={
            'whiteSpace': 'pre-wrap','wordBreak': 'break-all'
        })
    ])


@app.callback(Output('output-data-upload','children'),[Input('upload-data','contents')],[State('upload-data','filename'),State('upload-data','last_modified')])


def update_output(list_of_contents,list_of_names,list_of_dates):
    if list_of_contents is not None:
        children = [
            parse_contents(c,n,d) for c,d in
            zip(list_of_contents,list_of_dates)]
        return children

有人可以帮我/为我指明正确的方向,如何导入文本文件,然后在其中找到所需的模式,然后像在Jupyter中所做的那样转换为数据帧。

解决方法

如果要上传的文件已经以标准方式(例如CSV)格式化,那么您所提供的示例就是您所需要的。如果要处理非结构化文件,则可以从同一位置开始,但是必须替换此行:

df = pd.read_csv(io.StringIO(decoded.decode('utf-8')))

使用您自己的代码,该代码将从解码数据转换为格式化数据帧。看起来您可能已经有一些代码,因此应该找到最方便的方式将上传的文件传递给自己的代码,例如对decoded = base64.b64decode(content_string)这样的解码上传文件进行解码。您也许可以直接执行此操作,但我尚未尝试确认:

Log = io.StringIO(decoded.decode('utf-8')).readlines()

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...