我编写了一个简单的脚本,它应该读取整个目录,然后通过删除
HTML标记将
HTML数据解析为普通脚本,然后将其写入一个文件.
我有8GB内存和大量可用的虚拟内存.当我这样做时,我有超过5GB的RAM可用.目录中最大的文件是3.8 GB.
脚本是
file_count = 1 File.open("allscraped.txt",'w') do |out1| for file_name in Dir["allParts/*.dat"] do puts "#{file_name}#:#{file_count}" file_count +=1 File.open(file_name,"r") do |file| source = "" tmp_src = "" counter = 0 file.each_line do |line| scraped_content = line.gsub(/<.*?\/?>/,'') tmp_src << scraped_content if (counter % 10000) == 0 tmp_src = tmp_src.gsub( /\s{2,}/,"\n" ) source << tmp_src tmp_src = "" counter = 0 end counter += 1 end source << tmp_src.gsub( /\s{2,"\n" ) out1.write(source) break end end end
realscraper.rb:33:in `block (4 levels) in <main>': Failed to allocate memory (No MemoryError) from realscraper.rb:27:in `each_line' from realscraper.rb:27:in `block (3 levels) in <main>' from realscraper.rb:23:in `open' from realscraper.rb:23:in `block (2 levels) in <main>' from realscraper.rb:13:in `each' from realscraper.rb:13:in `block in <main>' from realscraper.rb:12:in `open' from realscraper.rb:12:in `<main>'
第27行是file.each_line do | line |和33是源<< tmp_src.失败的文件是最大的文件(3.8 GB).这里有什么问题?即使我有足够的内存,为什么我会收到此错误?另外我该如何解决?