从.RData文件加载前N行

问题描述

我在Google周围搜索，但是找不到我的问题的答案。像scan（base包）和fread（data.table包）这样的函数在从由.txt或.csv指定的前N行中读取前N行中表现出色用户。但是，当涉及到.RData时，load将加载整个文件，并且无法指定应从其中读取多少个值。

我有.RData文件，其大小超过3GB，每个文件都包含一个data.frame或data.table，并且不一定总是需要加载整个文件，而仅是第一个对象的100或1,000行。有办法吗？

解决方法

我的猜测是对此没有现成的解决方案。

如果我们查看一个ASCII编码但未压缩的RDS文件样本，就会发现它以列的主要顺序存储：

saveRDS(mtcars[1:5,1:2],"testrds.rds",ascii = TRUE,compress = FALSE)

生成此文件（我插入了评论）

A        ## ASCII file
3        ## some version info and ??
262146
197888
6
CP1252
787
2
14
5       ## This seems to indicate 5 items in this vector (column)
21      ## first column starts here (but how would you know?)
21
22.8
21.4
18.7    ## first column ends here
14
5       ## Again,This seems to indicate 5 items in this vector (column)
6       ## second column starts here
6
4
6
8       ## second column ends here
1026
1
262153    # Attributes start here: names,row.names,class 
5
names                ## col names
16
2
262153
3
mpg                  ### first col name
262153
3
cyl                  ### second col name
1026
1
262153
9
row.names            ## 2nd attribute: row.names 
16
5
262153
9
Mazda\040RX4         ### first row name
262153
13
Mazda\040RX4\040Wag  ### second row name
262153
10
Datsun\040710        ### ...
262153
14
Hornet\0404\040Drive
262153
17
Hornet\040Sportabout ### last row name
1026
1
262153
5
class                ## 3rd attribute: class
16
1
262153
10
data.frame           ### value of class
254

从这个简单的RDS文件中可以看到，读取前几行数据仍然需要解析整个文件，并且需要知道跳过哪些行。而且您想要的RDS文件文档比R Internals文档中的要多。

基于这个简单的示例，您可能会做出一些猜测并获得适用于您知道是数据帧的RDS文件的粗略草稿功能，但是这将需要一些工作-如果您想确保它足够强大以处理更复杂的数据帧（例如，使用factor和Date列）。如果您有RData文件，它们将具有相似但略微复杂的格式，因为它们可以处理多个对象。

总而言之，对于您可能要部分加载的数据，我认为RDS和RData是较差的选择。您最好使用CSV或TSV，然后可以使用问题中提到的标准选项（或vroom::vroom）将所需的数据仅加载到内存中。

这种简单的解决方法呢？

my_data <- head(readRDS("my_data.RDS"),n = 1000)

根据需要设置n的{{1}}参数。

如果您打算做很多事情，甚至可以使自己发挥一些作用。

head()

尝试read_lines_raw：

protected void Submited()
        {
            using var ctx = new CoolkitContext();
            var vanCheck = new VanCheck()
            {
                VanLongCode = "123",VanRegistration = "AB12CDEC"
            };
            ctx.VanCheck.Add(vanCheck);

            ctx.SaveChanges();
        }

dataframe datatable datatable r r rdata