将带有预定义部分的字符串输出拆分为字典的最佳方法

问题描述

当前,我正在处理要转换为Python中数组的程序(cmus)的输出,这是我正在处理的cmus的示例输出:

状态播放文件/ home / admin / Archive / Public / Music / Artists / Tsegue-maryam Guebrou / Ethiopiques vol。 21 Emahoy(钢琴独奏)(专辑)/ 14风的故事。flac艺术家Tsegue-maryam Guebrou专辑艺术家Tsegue-maryam Guebrou专辑Ethiopiques,第二卷。 21:艾玛霍伊(钢琴独奏)唱片编号1曲目编号14标题《风的故事》日期2005-12-01持续时间166

按照以下顺序,每个输出都是:

status
file
artist
albumartist
album
discnumber
tracknumber
title
date
duration

最近我一直在学习很多python,并且整天都在尝试破解它,我将如何从输出中形成一个像这样的字典:

csum_output = {
    "status": "playing","file": "/home/admin/Archive/Public/Music/Artists/Tsegue-maryam Guebrou/Ethiopiques vol. 21 Emahoy (Piano Solo) (Album)/14 The Story of the Wind.flac","artist": "Tsegue-maryam Guebrou","albumartist": "Tsegue-maryam Guebrou","album": "Ethiopiques,vol. 21: Emahoy (Piano Solo)","discnumber": "1","tracknumber": "14","title": "The Story of the Wind","date": "2005-12-01","duration": "166"
}

我尝试了很多事情,但都没有成功。我确实在csum Wiki上找到了这个:

def status_data(item):
   """Return the requested cmus status data."""
 
   # We loop through cmus status data and use each of its known data
   # types as 'delimiters',collecting data until we reach one,# inserting it into the dictionary -- rinse and repeat.
 
   # cmus helper script provides our data as argv[1].
   cmus_data = sys.argv[1]
 
   # Split the data into an easily-parsed list.
   cmus_data = cmus_data.split()
 
   # Our temporary collector list.
   collector = []
 
   # Dictionary that will contain our parsed-out data.
   cmus_info = {'status':"",'file':"",'artist':"",'album':"",'discnumber':"",'tracknumber':"",'title':"",'date':"",'duration':""}
 
   # Loop through cmus data and write it to our dictionary.
   last_found = "status"
   for value in cmus_data:
       collector.append(value)
       # Check to see if cmus value matches dictionary key.
       for key in cmus_info:
           # If a match has been found,record the data.
           if key == value:
               collector.pop()
               cmus_info[last_found] = " ".join(collector)
               collector = []
               last_found = key
 
   # Return whatever data main() requests.
   return cmus_info[item]

遗憾的是,经过大量测试,这似乎没有返回我的程序所需的持续时间。

解决方法

按照@snakecharmerb的注释中的说明,您收集了最后一个键(持续时间)的数据,但从未将其添加到字典中。

您必须在循环后添加一行:

for value in cmus_data:
    collector.append(value)
    ...

cmus_info[last_found] = " ".join(collector) # here,last_found is "duration"

但是,如果密钥在歌手姓名(或歌曲名称,专辑名称等)中会发生什么呢?您会得到一个错误(请注意,我使用辅助函数将字符串中的数据提取到字典中,以进行测试):

def extract_data(cmus_data):
   cmus_data = cmus_data.split()
   collector = []

   # Dictionary that will contain our parsed-out data.
   cmus_info = {'status':"",'file':"",'artist':"",'album':"",'discnumber':"",'tracknumber':"",'title':"",'date':"",'duration':""}

   last_found = "status"
   for value in cmus_data:
       collector.append(value)
       for key in cmus_info:
           if key == value:
               collector.pop()
               cmus_info[last_found] = " ".join(collector)
               collector = []
               last_found = key

   cmus_info[last_found] = " ".join(collector)

   return cmus_info


print(extract_data("status playing file /home/admin/Archive/Public/Music/Artists/Tsegue-maryam Guebrou/Ethiopiques vol. 21 Emahoy (Piano Solo) (Album)/14 The Story of the Wind.flac artist Tsegue-maryam Guebrou album Ethiopiques,vol. 21: Emahoy (Piano Solo) discnumber 1 tracknumber 14 title The Story of the status date 2005-12-01 duration 166"))

注意:title The Story of the status 。输出:

{'status': '','file': '/home/admin/Archive/Public/Music/Artists/Tsegue-maryam Guebrou/Ethiopiques vol. 21 Emahoy (Piano Solo) (Album)/14 The Story of the Wind.flac','artist': 'Tsegue-maryam Guebrou','album': 'Ethiopiques,vol. 21: Emahoy (Piano Solo)','discnumber': '1','tracknumber': '14','title': 'The Story of the','date': '2005-12-01','duration': '166'}

标题中的status清除了以前的状态。

由于您知道键的预期顺序,因此应该利用此信息:

def extract_data2(cmus_data):
   cmus_data = cmus_data.split()
   collector = []

   # Dictionary that will contain our parsed-out data.
   cmus_keys = ['status','file','artist','album','discnumber','tracknumber','title','date','duration']
   cmus_info = {}

   last_found = None
   it = iter(cmus_keys)
   expected_key = next(it) # the first key
   for value in cmus_data:
        if value == expected_key:
            if last_found is not None: # not the first key
                cmus_info[last_found] = " ".join(collector)
                collector = []
            last_found = expected_key
            expected_key = next(it,None) # we know the next expected key
        else:
            collector.append(value)

   cmus_info[last_found] = " ".join(collector)

   return cmus_info


print(extract_data2("status playing file /home/admin/Archive/Public/Music/Artists/Tsegue-maryam Guebrou/Ethiopiques vol. 21 Emahoy (Piano Solo) (Album)/14 The Story of the Wind.flac artist Tsegue-maryam Guebrou album Ethiopiques,vol. 21: Emahoy (Piano Solo) discnumber 1 tracknumber 14 title The Story of the status date 2005-12-01 duration 166"))

输出

{'status': 'playing','title': 'The Story of the status','duration': '166'}

它可以工作,但还应添加数据格式检查,例如持续时间是一个整数,以此类推,直到耗尽收集器。

最简单的解决方案可能是正则表达式:

import re

REGEX = re.compile("^status (?P<status>.+?) file (?P<file>.+?) artist (?P<artist>.+?) album (?P<album>.+?) discnumber (?P<discnumber>\d+?) tracknumber (?P<tracknumber>\d+?) title (?P<title>.+?) date (?P<date>\d{4}-\d{2}-\d{2}) duration (?P<duration>\d+?)$")

def extract_data3(cmus_data):
    return REGEX.match(cmus_data).groupdict()

print(extract_data3("status playing file /home/admin/Archive/Public/Music/Artists/Tsegue-maryam Guebrou/Ethiopiques vol. 21 Emahoy (Piano Solo) (Album)/14 The Story of the Wind.flac artist Tsegue-maryam Guebrou album Ethiopiques,vol. 21: Emahoy (Piano Solo) discnumber 1 tracknumber 14 title The Story of the status date 2005-12-01 duration 166"))

输出:

{'status': 'playing','duration': '166'}

当然,这仍然很脆弱,因为格式可能带有执行符(例如,date unknown之类的东西会使函数失败)。

,

另一种更简单的方法是

cmus_keys = ["status","file","artist","albumartist","album","discnumber","tracknumber","title","duration"]
for key in cmus_keys:
    cmus_data = cmus_data.replace(key + " ",key + "#").replace(" "+ key,"#" + key)
cmus_list = cmus_data.split("#")
cmus_info = dict(zip(cmus_list[::2],cmus_list[1::2]))

在此,cmus_info将包含键值对数据。

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...