问题描述
我正在用Python处理此数据,该数据的前四个字段之间用“ |”分隔,第五个字段以后由空格分隔。
VER:1|long=|lat=|device=D3052|eventid=31007311 status=Active time=1528496310749 priority=1 desitnationHost= group=cluster1
VER:1|long=|lat=|device=D3010|eventid=31007312 status=Active time=1528496310765 priority=1 desitnationHost= group=cluster1
VER:1|long=|lat=|device=D3094|eventid=31007313 status=Active time=1528496315380 priority=1 desitnationHost= group=cluster1
VER:1|long=|lat=|device=D3052|eventid=31007314 status=Active time=1528496317513 priority=1 desitnationHost= group=cluster1
VER:1|long=|lat=|device=D3010|eventid=31007315 status=Active time=1528496329604 priority=1 desitnationHost= group=cluster1
时间字段包含纪元时间值,需要将该值更新1年
此数据包含在目录中的多个文本文件中,需要通过逐行读取每个文本文件来对其进行处理。
我使用Python的方法-
#import required python library
import os
import re
#read a text file (later need to loop through multiple text files)
h = open('C:/directory/new_1.txt','r')
# Reading from the file
content = h.readlines()
# Iterating through the content
# Of the file
for line in content:
milli_second_in_year = 31536000000
l = re.sub(r'time=(\d+)',r'\1d','milli_second_in_year')
print(l)
在我上面的方法中,我无法用'milli_second_in_year'总结提取的时间值
我尝试了以下更改,但无法获得预期的输出-
for line in content:
m = re.search(r'time=(\d+)',line)
match = m.group(1)
match = int(match)+31536000000
print(match)
预期输出(更新的时间值)-
VER:1|long=|lat=|device=D3052|eventid=31007311 status=Active time=1560032310749 priority=1 desitnationHost= group=cluster1
VER:1|long=|lat=|device=D3010|eventid=31007312 status=Active time=1560032310765 priority=1 desitnationHost= group=cluster1
VER:1|long=|lat=|device=D3094|eventid=31007313 status=Active time=1560032315380 priority=1 desitnationHost= group=cluster1
VER:1|long=|lat=|device=D3052|eventid=31007314 status=Active time=1560032317513 priority=1 desitnationHost= group=cluster1
VER:1|long=|lat=|device=D3010|eventid=31007315 status=Active time=1560032329604 priority=1 desitnationHost= group=cluster1
解决方法
如果我正确理解了您想做什么,则可以执行以下操作:
milli_second_in_year = 31536000000
with open('C:/directory/new_1.txt','r') as f:
with open('C:/directory/new_1_adapted.txt','w+') as fnew:
for line in f:
m = re.search(r'time=(\d+)',line)
time_value = m.group(1)
new_time_value = str(int(time_value) + milli_second_in_year)
newline = line.replace(time_value,new_time_value)
fnew.write(newline)
几件事要注意:
- 使用上下文管理器打开文件(
with open...
)可确保始终正确关闭文件 - 无需使用
readlines
-您可以使用文件句柄遍历行 - 我不确定是否要覆盖相同的文件:在这种情况下,您必须先写入另一个文件,然后删除第一个文件并重命名第二个文件,或者将行收集到数组中并将其写入文件关闭后返回(我在下面添加了一个版本)
- 您对
re.sub
的使用不正确-如果您要使用它,请查阅文档(我不在这里) - 我没有添加任何错误处理-如果您的文件格式错误,则可能会崩溃
- 最后:我还没有测试过,所以可能有错误...
以下是将覆盖同一文件的版本:
milli_second_in_year = 31536000000
file_path = 'C:/directory/new_1.txt'
new_lines = []
with open(file_path,'r') as f:
for line in f:
m = re.search(r'time=(\d+)',line)
time_value = m.group(1)
new_time_value = str(int(time_value) + milli_second_in_year)
new_line = line.replace(time_value,new_time_value)
new_lines.append(new_line)
with open(file_path,'w') as f:
f.writelines(new_lines)