Google Apps脚本:用于修复格式错误的管道分隔的csv文件的REGEX运行太慢

问题描述

我有一个Google Apps脚本,每天处理此“ csv”文件文件越来越大,并且开始超时。 管道定界的“ csv”文件在某些​​记录的注释字段中包括换行和换行。这将导致这些记录在记录的真实结尾之前中断。以下代码在记录的中间删除多余的新行和下一行,并以有用的csv格式格式化数据。有没有更有效的方法来编写此代码

以下是代码段:

function cleanCSV(csvFileId){
//The file we receive has line breaks in the middle of the records,this removes the line breaks and converts the file to a csv.
        var content = DriveApp.getFileById(csvFileId).getBlob().getDataAsstring();
        var identifyNewLine = content.replace(/\r\n\d{1,5}\|/g,"~~$&"); //This marks the beginning of a new record with double tildes before we can remove all the line breaks.
        var noreturnsContent = identifyNewLine.replace(/\r\n/g,""); //Removes Returns
        var newContent = noreturnsContent.replace(/~~/g,"\r\n"); //returns one record per client
        var noEndQuote = newContent.replace(/'\|/g,"|"); // removes trailing single quote
        var csvContent = noEndQuote.replace(/\|'/g,"|"); // removes leading single quote
        //Logger.log(csvContent);
        var sheetId = DriveApp.getFolderById(csvFolderId).createFile(csvFileName,csvContent,MimeType.CSV).getId();
        return sheetId;
}

这是文件sample

解决方法

前三行class Window(Frame): def __init__(self,master=None): Frame.__init__(self,master) self.master = master self.myElements() def myElements(self): self.master.title("1841144-SANDRINE P JOY") self.pack(fill=BOTH,expand=1) self.label1 = Label(text="Starts with") self.label1.place(x=10,y=0) self.generateButton = Button(self,text="Generate",command=self.autogen) self.generateButton.place(x=150,y=0) self.label2 = Label(text="--Here,your Random Name appears--") self.label2.place(x=70,y=30) def autogen(self): randomName = pick(names) self.label2.insert(INSERT,randomName) 可以合并为一行,您只想删除所有replace出现的字符,这些事件后面不跟1到5位数字,而\r\n,{{1 }}。

如果您使用交替|,则最后两行.replace(/\r\n(?!\d{1,5}\|)/g,"")也可以合并为一行。

使用

replace