用tcl处理大文件

问题描述

谁能告诉我如何更新以下程序来处理大文件（大小

proc read_text_file { file } {
set fp [open ${file} r]
 set return_data ""

while { [gets $fp each_line] != -1 } {

lappend return_data ${each_line}

}
close $fp

return ${return_data}
}

我的目标是在更好的运行时逐行读取一个巨大的文件

谢谢

解决方法

当您有一个非常大的文件时，您绝对希望避免一次将其全部放入内存中。（此外，Tcl 8.* 具有内存块分配限制，这使得引入 50GB 的数据强烈令人兴奋。这是一个长期存在的 API 错误，已在 9.0 中修复 - 在 alpha 中 - 但您将拥有暂时忍受它。）

如果可以，请遍历文件以确定其有趣的子块在哪里。为了论证起见，让我们假设那些是匹配模式的行；下面是一个示例，用于查找过程在 Tcl 脚本中的位置（在一些简单的假设下）。

proc buildIndices {filename} {
    set f [open $filename]
    set indices {}
    try {
        while {![eof $f]} {
            set idx [tell $f]
            set line [gets $f]
            if {[regexp {^proc (\w+)} $line -> name]} {
                dict set indices $name $idx
            }
        }
        return $indices
    } finally {
        close $f
    }
}

现在你有了索引，然后你可以从文件中提取一个过程，如下所示：

proc obtainProcedure {filename procName indices} {
    set f [open $filename]
    try {
        seek $f [dict get $indices $procName]
        set procedureDefinition ""
        while {[gets $f line] >= 0} {
            append procedureDefinition $line "\n"
            if {[info complete $procedureDefinition]} {
                # We're done; evaluate the script in the caller's context
                tailcall eval $procedureDefinition
            }
        }
    } finally {
        close $f
    }
}

你会像这样使用它：

# Once (possibly even save this to its own file)
set indices [buildIndices somefile.tcl]
# Then,to use
obtainProcedure somefile.tcl foobar $indices

如果您经常这样做，请将您的代码转换为使用数据库；从长远来看，它们的效率要高得多。建立索引相当于建立数据库，其他过程相当于做一个数据库查询。

scripting stack-overflow tcl