问题描述
下面的代码创建的原始数据与我正在使用的类似。我编写了一些代码,使用tibble软件包中的add_row函数将其重新格式化。现在我遇到了一个错误(此代码在2020年4月之后有效)。子集的规则似乎由于软件包的更新而变得更加严格了?我想知道是否有人可以帮助纠正此错误... 首先创建数据
# Create replicate of raw data
date <- seq(from = as.Date('1999-01-01'),to = as.Date('2013-12-31'),by = 'day')
temp <- rnorm(5479,15,5)
precip <- rlnorm(5479)
rawdata <- data.frame(date=date,temp=round(temp,digits = 2),precip=round(precip,digits = 2))
# Add columns needed to run code
rawdata$year <- as.numeric(substr(rawdata$date,1,4))
rawdata$month <- as.numeric(substr(rawdata$date,6,7))
rawdata$chardate <- format(rawdata$date,'%Y-%h-%d') # create abbreviated month column
rawdata$charmonth <- substr(rawdata$chardate,8) # for formatting
rawdata$charmonth <- as.character(rawdata$charmonth)
rawdata$day <- as.numeric(substr(rawdata$date,9,10))
rawdata$uniqdate <- rawdata$year*100+as.numeric(rawdata$day)+rawdata$month*10
rawdata$uniqmonth <- (rawdata$year*100)+rawdata$month# create unique month identifier
rawdata$yr <- NA # This column will be filled only in the new rows to be added
# Create weather object to Feed the for loop below----
weather <- data.frame(year = rawdata$year,month = rawdata$month,day = rawdata$day,charmonth = rawdata$charmonth,uniqmonth = rawdata$uniqmonth,uniqdate = rawdata$uniqdate,temp = rawdata$temp,precip = rawdata$precip,yr = rawdata$yr)
# weather$charmonth <- as.character(rawdata$charmonth)
现在出现错误...我正尝试在每个月的数据顶部添加一行,其中包含该月的天数,缩写三个字母的月份(jan,feb,mar等)。 )和年份。
library(tibble) # package containing the add_row function
# create empty list to put all of the monthly dataframes in
newdat <- list()
# the following loop will create a dataframe for each month and put in a list
for(i in unique(weather$uniqmonth)) { # for every unique month value
# create object nam that is of the format 'df.uniqmonth'
nam <- paste("df",i,sep = ".")
# create object dat that contains all data for each unique month
dat <- weather[weather$uniqmonth==i,]
# add a row of data at the start of each dataframe with the days in month,month abbr.,year
dat <- add_row(dat,year = NA,month = NA,day = NA,charmonth = NA,uniqmonth = NA,uniqdate = NA,# the line below is the info we are adding in the columns we will keep
temp = na.omit(max(dat$day)),precip = unique(dat$charmonth),yr = unique(dat$year),.before = 1)
# just keep required columns
dat <- data.frame(dat$temp,dat$precip,dat$yr)
# add new dataframe to a list,using the new name
newdat[[nam]] <- dat
}
** 您可以运行循环,也可以逐行运行(设置i = 199901),并且错误相同:
错误:无法结合..1$precip
和..2$precip
。
最终,我应该能够运行以下命令以获得所需的输出,并在文本编辑器中完成该输出(删除尾部逗号)。 **
# Merge all data into a dataframe
full_data <- do.call("rbind",newdat)
# turn NA's into blanks
full_data[is.na(full_data)] <- ""
这就是我需要的最终产品
a <- c("Jan",round(rnorm(31,5),"Feb",round(rnorm(28,5,"Mar",digits = 2))
b <- c(31,rlnorm(31),28,rlnorm(28),31,rlnorm(31))
c <- c(1999,rep(NA,31),1999,28),31))
final_data <- data.frame(temp = a,precip = round(b,digits=2),year = c)
解决方法
经过长时间的讨论,最终结果不是传统的CSV,因此需要一些弯曲。
鉴于weather
开始看起来像这样:
head(weather)
# # A tibble: 6 x 9
# year month day charmonth uniqmonth uniqdate temp precip yr
# <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <lgl>
# 1 1999 1 1 Jan 199901 199911 13.8 2.03 NA
# 2 1999 1 2 Jan 199901 199912 10.8 2.53 NA
# 3 1999 1 3 Jan 199901 199913 8.78 3.15 NA
# 4 1999 1 4 Jan 199901 199914 14.3 0.63 NA
# 5 1999 1 5 Jan 199901 199915 18.5 0.47 NA
# 6 1999 1 6 Jan 199901 199916 10.4 0.39 NA
所需的输出(full_data
)在文件中如下所示:
Jan,31,1999
13.83,2.03
10.76,2.53
8.78,3.15
...truncated...
18.74,0.79
Feb,28,1999
17.47,1.62
9.15,0.48
...truncated...
18.36,2.26
Mar,1999
20.53,2.65
11.1,2.58
19.52,0.33
...truncated...
关键是输出实际上是两列:precip
和temp
,但是每一“天”都需要一个三列的标题。
我认为最简单的处理方法是首先group_by
主分组变量(uniqmonth
),然后do
对每组数据进行处理。这种“东西”实际上是:(1)创建新的标题行,即charmonth,max(day)
和year
。由于该逗号比普通CSV中的逗号要多,因此我将在第一个字段中插入逗号,并告诉write.table
不要引用它。这是一种解决方法,但是...有效。
library(dplyr)
weather %>%
group_by(uniqmonth) %>%
do({
bind_rows(
tibble(temp = paste(.$charmonth[1],max(.$day),sep = ","),precip = as.character(.$year[1])),mutate_all(select(.,temp,precip),as.character)
)
}) %>%
ungroup() %>%
select(-uniqmonth) %>%
write.table(.,file = "quux.csv",quote = FALSE,",row.names = FALSE,col.names = FALSE)