问题描述
我在python中创建了一个脚本,用于修复.srt文件中错误编码的土耳其语字符。 例如用正确的字符“ı”代替“ý”。
我打开文件(读取),将行遍历到.replace('ý','ı')
,然后使用'w',encoding='utf8'
将新的行集写入新文件。
第一次很棒!问题在于,每次迭代都通过用其他2个字符替换固定字符来弄乱固定字符。如有需要,可以提供更多信息!
部分输入内容:
yakýn deðillerdi,ama
bir þeyler yapmak istedim
第一次输出:
yakın değillerdi,ama
bir şeyler yapmak istedim
第二次输出:
yakın değillerdi,ama
bir ÅŸeyler yapmak istedim
第三次输出:
yakın değillerdi,ama
bir ÅŸeyler yapmak istedim
每次运行都会变得更糟。有什么想法吗?如果我不得不猜测,我找到的字符('ý')与文件中已有的('ı')匹配,然后将其替换为('ı'),该字符被错误地编码为('ı' )?这也不是每次都进行的系统更改(请参阅第二->第三迭代),所以我很困惑。 我是个新手,所以请原谅我可能没有的任何“明显”知识!
编辑: 要求的代码:
import os
directoryPath = 'D:\\tv\\b99'
fileTypes = ['.srt']
fullFilePaths = []
def get_filepaths(directory,filetype):
"""
This function will generate the file names in a directory
tree by walking the tree either top-down or bottom-up. For each
directory in the tree rooted at directory top (including top itself),it yields a 3-tuple (dirpath,dirnames,filenames).
"""
filePathslist = []
for root,directories,files in os.walk(directory):
for filename in files:
# Join the two strings in order to form the full filepath.
filepath = os.path.join(root,filename)
# include only the specific file types,except their hidden/shadow files
if filepath.endswith(filetype) and not filename.startswith('.'):
filePathslist.append(filepath) # Add it to the list.
return filePathslist
n=0
def replaceChars(folderAsListOfPaths):
"""
This function takes a list as argument,containing file paths.
The file is read line by line,and for each of the "special"
characters in Turkish that get encoded incorrectly,the appropriate
replacement - shown below - is made,and the existing file is overwritten.
('ý'->'ı') / ('Ý'->'İ') / ('þ'->'ş') / ('Þ'->'Ş') / ('ð'->'ğ')
The filenames are printed when the replacement is done,for confirmation.
"""
# read file line by line
file = open(folderAsListOfPaths[n],"r")
lines = file.readlines()
newFileContent = ''
for line in lines:
origLine = line
fixedLine = origLine.replace('ý','ı')
fixedLine = fixedLine.replace('Ý','İ')
fixedLine = fixedLine.replace('þ','ş')
fixedLine = fixedLine.replace('Þ','Ş')
fixedLine = fixedLine.replace('ð','ğ')
newFileContent += fixedLine
file.close()
newFile = open(folderAsListOfPaths[n],'w',encoding='utf8')
# print(newFileContent)
newFile.write(newFileContent)
newFile.close()
cleaned_name = folderAsListOfPaths[n].replace(directoryPath,'')
cleaned_name = cleaned_name.replace('\\','')
print(cleaned_name)
for type in fileTypes:
fullFilePaths.extend(get_filepaths(directoryPath,type))
# filled the fullFilePaths list with the files
print('Finished with files:')
for file in fullFilePaths: # for every file in this folder
replaceChars(fullFilePaths) # replace the characters
n+=1 # move onto the next file
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)