Python 在将 JsonL 转换为 CSV 文件时插入 CSV 文件的标头

问题描述

目前正在编写一个脚本,将文件从 jsonl 格式转换为 CSV 格式,最后我写了一行来包含 csv 文件标题,以标识每个转换的变量。但是,似乎从脚本生成的 CSV 文件似乎对读取的每个 json 行都有一个标题,我只想要一个文件,该文件在第 1 行具有标题和下面的其余值,而不是单独的标题读取每个单独的 json 行。希望有人能帮我解决这个问题,谢谢!

示例jsonl:

{"symbol": "DOGE-PERP","timestamp": 1621948955550,"datetime": "2021-05-25T13:22:35.550Z","high": null,"low": null,"bid": 0.342372,"bidVolume": null,"ask": 0.3424855,"askVolume": null,"vwap": null,"open": null,"close": 0.3424025,"last": 0.3424025,"prevIoUsClose": null,"change": null,"percentage": 0.039249281423858244,"average": null,"baseVolume": null,"quoteVolume": 433162290.0506585,"info": {"name": "DOGE-PERP","enabled": true,"postOnly": false,"priceIncrement": "5e-7","sizeIncrement": "1.0","minProvideSize": "1.0","last": "0.3424025","bid": "0.342372","ask": "0.3424855","price": "0.3424025","type": "future","baseCurrency": null,"quoteCurrency": null,"underlying": "DOGE","restricted": false,"highLeverageFeeExempt": false,"change1h": "0.023470298206100425","change24h": "0.039249281423858244","changeBod": "-0.07136396489976689","quoteVolume24h": "433162290.0506585","volumeUsd24h": "433162290.0506585"}}
{"symbol": "DOGE-PERP","timestamp": 1621948955976,"datetime": "2021-05-25T13:22:35.976Z","bid": 0.3424955,"ask": 0.3427185,"close": 0.3427185,"last": 0.3427185,"percentage": 0.04020839466903005,"last": "0.3427185","bid": "0.3424955","ask": "0.3427185","price": "0.3427185","change1h": "0.024414849178225707","change24h": "0.04020839466903005","changeBod": "-0.07050693556414092","volumeUsd24h": "433162290.0506585"}}

CSV 文件当前的样子:

enter image description here

我想要达到的目标:

enter image description here

我的脚本:

import glob
import json
import csv

import time


start = time.time()
#import pandas as pd
from flatten_json import flatten

#Path of jsonl file
File_path = (r'C:\Users\Natthanon\Documents\Coding 101\Python\JSONL')
#reading all jsonl files
files = [f for f in glob.glob( File_path + "**/*.jsonl",recursive=True)]
i=0
for f in files:
    with open(f,'r') as F:
        for line in F:
#flatten json files 
            data = json.loads(line)
            data_1=flatten(data)
#creating csv files  
            with open(r'C:\Users\Natthanon\Documents\Coding 101\Python\CSV\\' + f.split("\\")[-1] +".csv",'a',newline='') as csv_file:
                thewriter = csv.writer(csv_file)
                thewriter.writerow(["symbol","timestamp","datetime","high","low","bid","bidVolume","ask","askVolume","vwap","open","close","last","prevIoUsClose","change","percentage","average","baseVolume","quoteVolume"])
#headers should be the Key values from json files that make Coulmn header                    
                thewriter.writerow([data_1['symbol'],data_1['timestamp'],data_1['datetime'],data_1['high'],data_1['low'],data_1['bid'],data_1['bidVolume'],data_1['ask'],data_1['askVolume'],data_1['vwap'],data_1['open'],data_1['close'],data_1['last'],data_1['prevIoUsClose'],data_1['change'],data_1['percentage'],data_1['average'],data_1['baseVolume'],data_1['quoteVolume']])

解决方法

在开始解析行之前,您需要将输出 CSV 文件 open() 移动到,例如:

import glob
import json
import csv
import time


start = time.time()
#import pandas as pd
from flatten_json import flatten

#Path of jsonl file
File_path = (r'C:\Users\Natthanon\Documents\Coding 101\Python\JSONL')
#reading all jsonl files
files = [f for f in glob.glob( File_path + "**/*.jsonl",recursive=True)]
i = 0

for f in files:
    with open(f,'r') as F:
        #creating csv files  
        with open(r'C:\Users\Natthanon\Documents\Coding 101\Python\CSV\\' + f.split("\\")[-1] + ".csv",'w',newline='') as csv_file:
            thewriter = csv.writer(csv_file)
            thewriter.writerow(["symbol","timestamp","datetime","high","low","bid","bidVolume","ask","askVolume","vwap","open","close","last","previousClose","change","percentage","average","baseVolume","quoteVolume"])

            for line in F:
                #flatten json files 
                data = json.loads(line)
                data_1 = flatten(data)
                #headers should be the Key values from json files that make Column header                    
                thewriter.writerow([data_1['symbol'],data_1['timestamp'],data_1['datetime'],data_1['high'],data_1['low'],data_1['bid'],data_1['bidVolume'],data_1['ask'],data_1['askVolume'],data_1['vwap'],data_1['open'],data_1['close'],data_1['last'],data_1['previousClose'],data_1['change'],data_1['percentage'],data_1['average'],data_1['baseVolume'],data_1['quoteVolume']])

在您的代码中,您为每一行打开和关闭文件,并且每次都添加标题。