javascript – 读取大文本文件的n行

我拥有的最小文件> 850k线和每条线的长度未知.目标是在浏览器中从该文件中读取n行.完全阅读它不会发生.

这是HTML< input type =“file”name =“file”id =“file”>和JS我有

var n = 10;
var reader = new FileReader();
reader.onload = function(progressEvent) {
  // Entire file
  console.log(this.result);

  // By lines
  var lines = this.result.split('\n');
  for (var line = 0; line < n; line++) {
    console.log(lines[line]);
  }
};

显然,这里的问题是它试图首先实现整个文件,然后用换行符拆分它.因此,无论n,它都会尝试读取整个文件,并在文件很大时最终不读取任何内容.

我该怎么办?

注意:我愿意删除整个函数并从头开始,因为我将能够在我们阅读的每一行中调用console.log().

*“每行都是未知长度” – >意味着该文件是这样的:

(0,(1,2))
(1,(4,5,6))
(2,(7))
(3,(8))

编辑:

要走的路就像filereader api on big files,但我看不出如何修改它来读取文件的n行……

通过使用Uint8Array to string in Javascript,可以从那里做到:

var view = new Uint8Array(fr.result);
var string = new TextDecoder("utf-8").decode(view);
console.log("Chunk " + string);

但这可能无法读取整个最后一行,那么您将如何确定以后的行?例如,这是它打印的内容

((7202),(u'11330875493',u'2554375661'))
((1667),(u'9079074735',u'6883914476',

解决方法

逻辑非常类似于我在 filereader api on big files的回答中所写的内容,除了你需要跟踪到目前为止已处理的行数(以及到目前为止读取的最后一行,因为它可能尚未结束) .下一个示例适用于与UTF-8兼容的任何编码;如果您需要其他编码,请查看 TextDecoder构造函数的选项.

如果您确定输入是ASCII(或任何其他单字节编码),那么您也可以跳过使用TextDecoder并使用FileReader‘s readAsText method直接将输入读取为文本.

// This is just an example of the function below.
document.getElementById('start').onclick = function() {
    var file = document.getElementById('infile').files[0];
    if (!file) {
        console.log('No file selected.');
        return;
    }
    var maxlines = parseInt(document.getElementById('maxlines').value,10);
    var lineno = 1;
    // readSomeLines is defined below.
    readSomeLines(file,maxlines,function(line) {
        console.log("Line: " + (lineno++) + line);
    },function onComplete() {
        console.log('Read all lines');
    });
};

/**
 * Read up to and including |maxlines| lines from |file|.
 *
 * @param {Blob} file - The file to be read.
 * @param {integer} maxlines - The maximum number of lines to read.
 * @param {function(string)} forEachLine - Called for each line.
 * @param {function(error)} onComplete - Called when the end of the file
 *     is reached or when |maxlines| lines have been read.
 */
function readSomeLines(file,forEachLine,onComplete) {
    var CHUNK_SIZE = 50000; // 50kb,arbitrarily chosen.
    var decoder = new TextDecoder();
    var offset = 0;
    var linecount = 0;
    var linenumber = 0;
    var results = '';
    var fr = new FileReader();
    fr.onload = function() {
        // Use stream:true in case we cut the file
        // in the middle of a multi-byte character
        results += decoder.decode(fr.result,{stream: true});
        var lines = results.split('\n');
        results = lines.pop(); // In case the line did not end yet.
        linecount += lines.length;
    
        if (linecount > maxlines) {
            // Read too many lines? Truncate the results.
            lines.length -= linecount - maxlines;
            linecount = maxlines;
        }
    
        for (var i = 0; i < lines.length; ++i) {
            forEachLine(lines[i] + '\n');
        }
        offset += CHUNK_SIZE;
        seek();
    };
    fr.onerror = function() {
        onComplete(fr.error);
    };
    seek();
    
    function seek() {
        if (linecount === maxlines) {
            // We found enough lines.
            onComplete(); // Done.
            return;
        }
        if (offset !== 0 && offset >= file.size) {
            // We did not find all lines,but there are no more lines.
            forEachLine(results); // This is from lines.pop(),before.
            onComplete(); // Done
            return;
        }
        var slice = file.slice(offset,offset + CHUNK_SIZE);
        fr.readAsArrayBuffer(slice);
    }
}
Read <input type="number" id="maxlines"> lines from
<input type="file" id="infile">.
<input type="button" id="start" value="Print lines to console">

相关文章

前言 做过web项目开发的人对layer弹层组件肯定不陌生,作为l...
前言 前端表单校验是过滤无效数据、假数据、有毒数据的第一步...
前言 图片上传是web项目常见的需求,我基于之前的博客的代码...
前言 导出Excel文件这个功能,通常都是在后端实现返回前端一...
前言 众所周知,js是单线程的,从上往下,从左往右依次执行,...
前言 项目开发中,我们可能会碰到这样的需求:select标签,禁...