问题描述
当前,我正在处理要转换为Python中数组的程序(cmus)的输出,这是我正在处理的cmus的示例输出:
状态播放文件/ home / admin / Archive / Public / Music / Artists / Tsegue-maryam Guebrou / Ethiopiques vol。 21 Emahoy(钢琴独奏)(专辑)/ 14风的故事。flac艺术家Tsegue-maryam Guebrou专辑艺术家Tsegue-maryam Guebrou专辑Ethiopiques,第二卷。 21:艾玛霍伊(钢琴独奏)唱片编号1曲目编号14标题《风的故事》日期2005-12-01持续时间166
按照以下顺序,每个输出都是:
status
file
artist
albumartist
album
discnumber
tracknumber
title
date
duration
最近我一直在学习很多python,并且整天都在尝试破解它,我将如何从输出中形成一个像这样的字典:
csum_output = {
"status": "playing","file": "/home/admin/Archive/Public/Music/Artists/Tsegue-maryam Guebrou/Ethiopiques vol. 21 Emahoy (Piano Solo) (Album)/14 The Story of the Wind.flac","artist": "Tsegue-maryam Guebrou","albumartist": "Tsegue-maryam Guebrou","album": "Ethiopiques,vol. 21: Emahoy (Piano Solo)","discnumber": "1","tracknumber": "14","title": "The Story of the Wind","date": "2005-12-01","duration": "166"
}
我尝试了很多事情,但都没有成功。我确实在csum Wiki上找到了这个:
def status_data(item):
"""Return the requested cmus status data."""
# We loop through cmus status data and use each of its known data
# types as 'delimiters',collecting data until we reach one,# inserting it into the dictionary -- rinse and repeat.
# cmus helper script provides our data as argv[1].
cmus_data = sys.argv[1]
# Split the data into an easily-parsed list.
cmus_data = cmus_data.split()
# Our temporary collector list.
collector = []
# Dictionary that will contain our parsed-out data.
cmus_info = {'status':"",'file':"",'artist':"",'album':"",'discnumber':"",'tracknumber':"",'title':"",'date':"",'duration':""}
# Loop through cmus data and write it to our dictionary.
last_found = "status"
for value in cmus_data:
collector.append(value)
# Check to see if cmus value matches dictionary key.
for key in cmus_info:
# If a match has been found,record the data.
if key == value:
collector.pop()
cmus_info[last_found] = " ".join(collector)
collector = []
last_found = key
# Return whatever data main() requests.
return cmus_info[item]
遗憾的是,经过大量测试,这似乎没有返回我的程序所需的持续时间。
解决方法
按照@snakecharmerb的注释中的说明,您收集了最后一个键(持续时间)的数据,但从未将其添加到字典中。
您必须在循环后添加一行:
for value in cmus_data:
collector.append(value)
...
cmus_info[last_found] = " ".join(collector) # here,last_found is "duration"
但是,如果密钥在歌手姓名(或歌曲名称,专辑名称等)中会发生什么呢?您会得到一个错误(请注意,我使用辅助函数将字符串中的数据提取到字典中,以进行测试):
def extract_data(cmus_data):
cmus_data = cmus_data.split()
collector = []
# Dictionary that will contain our parsed-out data.
cmus_info = {'status':"",'file':"",'artist':"",'album':"",'discnumber':"",'tracknumber':"",'title':"",'date':"",'duration':""}
last_found = "status"
for value in cmus_data:
collector.append(value)
for key in cmus_info:
if key == value:
collector.pop()
cmus_info[last_found] = " ".join(collector)
collector = []
last_found = key
cmus_info[last_found] = " ".join(collector)
return cmus_info
print(extract_data("status playing file /home/admin/Archive/Public/Music/Artists/Tsegue-maryam Guebrou/Ethiopiques vol. 21 Emahoy (Piano Solo) (Album)/14 The Story of the Wind.flac artist Tsegue-maryam Guebrou album Ethiopiques,vol. 21: Emahoy (Piano Solo) discnumber 1 tracknumber 14 title The Story of the status date 2005-12-01 duration 166"))
注意:title The Story of the
status
。输出:
{'status': '','file': '/home/admin/Archive/Public/Music/Artists/Tsegue-maryam Guebrou/Ethiopiques vol. 21 Emahoy (Piano Solo) (Album)/14 The Story of the Wind.flac','artist': 'Tsegue-maryam Guebrou','album': 'Ethiopiques,vol. 21: Emahoy (Piano Solo)','discnumber': '1','tracknumber': '14','title': 'The Story of the','date': '2005-12-01','duration': '166'}
标题中的status
清除了以前的状态。
由于您知道键的预期顺序,因此应该利用此信息:
def extract_data2(cmus_data):
cmus_data = cmus_data.split()
collector = []
# Dictionary that will contain our parsed-out data.
cmus_keys = ['status','file','artist','album','discnumber','tracknumber','title','date','duration']
cmus_info = {}
last_found = None
it = iter(cmus_keys)
expected_key = next(it) # the first key
for value in cmus_data:
if value == expected_key:
if last_found is not None: # not the first key
cmus_info[last_found] = " ".join(collector)
collector = []
last_found = expected_key
expected_key = next(it,None) # we know the next expected key
else:
collector.append(value)
cmus_info[last_found] = " ".join(collector)
return cmus_info
print(extract_data2("status playing file /home/admin/Archive/Public/Music/Artists/Tsegue-maryam Guebrou/Ethiopiques vol. 21 Emahoy (Piano Solo) (Album)/14 The Story of the Wind.flac artist Tsegue-maryam Guebrou album Ethiopiques,vol. 21: Emahoy (Piano Solo) discnumber 1 tracknumber 14 title The Story of the status date 2005-12-01 duration 166"))
输出
{'status': 'playing','title': 'The Story of the status','duration': '166'}
它可以工作,但还应添加数据格式检查,例如持续时间是一个整数,以此类推,直到耗尽收集器。
最简单的解决方案可能是正则表达式:
import re
REGEX = re.compile("^status (?P<status>.+?) file (?P<file>.+?) artist (?P<artist>.+?) album (?P<album>.+?) discnumber (?P<discnumber>\d+?) tracknumber (?P<tracknumber>\d+?) title (?P<title>.+?) date (?P<date>\d{4}-\d{2}-\d{2}) duration (?P<duration>\d+?)$")
def extract_data3(cmus_data):
return REGEX.match(cmus_data).groupdict()
print(extract_data3("status playing file /home/admin/Archive/Public/Music/Artists/Tsegue-maryam Guebrou/Ethiopiques vol. 21 Emahoy (Piano Solo) (Album)/14 The Story of the Wind.flac artist Tsegue-maryam Guebrou album Ethiopiques,vol. 21: Emahoy (Piano Solo) discnumber 1 tracknumber 14 title The Story of the status date 2005-12-01 duration 166"))
输出:
{'status': 'playing','duration': '166'}
当然,这仍然很脆弱,因为格式可能带有执行符(例如,date unknown
之类的东西会使函数失败)。
另一种更简单的方法是
cmus_keys = ["status","file","artist","albumartist","album","discnumber","tracknumber","title","duration"]
for key in cmus_keys:
cmus_data = cmus_data.replace(key + " ",key + "#").replace(" "+ key,"#" + key)
cmus_list = cmus_data.split("#")
cmus_info = dict(zip(cmus_list[::2],cmus_list[1::2]))
在此,cmus_info将包含键值对数据。