问题描述
我刚刚开始使用 R,所以我可能没有正确理解这些函数。
我从文献中获得了长度和年龄数据,并且测量了一些样本。我需要根据我从文献中计算出的生长方程找到我测量的这些标本的年龄。
这是我的数据中的一个小例子:
#Example data
length <- c(0.06,0.087,0.147,0.241,0.615,1.49,2.42)
age <- c(1.3,2.6,3.3,3.9,5.45,8,10.5)
#Polynomial function second degree
growth <- lm(length ~ poly(age,2,raw=TRUE)-1)
#Need to use this equation to predict x (age) given y (length)
#New y data
mydata_length <- c(0.72,1.82,0.41,0.28)
为了根据我的数据 (y) 计算年龄 (x),我尝试了多种解决方案,我认为函数 spline()
将是正确的解决方案。但是,看起来它一直只给我多项式方程的 2 个可能解中的 1 个,而这是错误的。
因为它是一条增长曲线,所以它从原点开始,我需要方程的正解。如果我在另一个软件中输入系数来求解方程,我实际上可以得到两个解决方案,包括我需要的解决方案,我在这里报告。不过,这是一个非常耗时的解决方案,而且不利于重现。
#Spline doesn't give the correct results
xvals <- spline(x = growth$fitted.values,xout = mydata_length)$y
#xvals
#[1] -0.01614070 0.09184022 -0.05031075 -0.06537486
#Expected results
#5.88 9.08 4.56 3.85
R 中是否有函数可以查找我要查找的结果? 我还有其他更适合线性回归的数据,所以如果有一个相同但带有线性模型的函数也很好。
解决方案
我找到了一种获得我想要的东西的方法,可能不是最漂亮的方式,但它有效。
#Calculate age (x) based on my length data (y)
library(polynom)
#Save coefficients
coeffs <- growth$coefficients
#Create function for groth,0 as intercept
growth_f <- polynomial(c(0,coeffs))
#List of specimen names - row names of y data frame
specimens <- c("sp1","sp2","sp3","sp4")
#Row names for x (calculated age) data frame
rows <- c("discard","keep")
#New data frame with y (length) data
mydata_length_df <- data.frame(mydata_length,row.names = specimens)
#Matrix for calculated age results
x <- matrix(nrow = 2,ncol = nrow(mydata_length_df))
#Make data frame
x <- data.frame(x)
#Set row and column names
rownames(x) <- rows
colnames(x) <- specimens
#Loop to calculate both results of the polynomial for all specimens - second row is the one to keep
for(i in 1:4) {
x[i] <- data.frame(solve(growth_f,mydata_length_df[i,]))
}
x
# sp1 sp2 sp3 sp4
#discard -4.996448 -8.189725 -3.671227 -2.965612
#keep 5.889859 9.083136 4.564638 3.859024
#Transpose to match rest of the data
x <- t(x)
#Data frame with both length and calculated age
new_results <- data.frame(length = mydata_length,calc_age = x[,2])
new_results
# length calc_age
#sp1 0.72 5.889859
#sp2 1.82 9.083136
#sp3 0.41 4.564638
#sp4 0.28 3.859024
Plot with expected and calculated results
解决方法
如果长度可以预测年龄,那么公式应该是age ~ f(length)。它的形式是predicted ~ predictors。
#Length predicts age
growth <- lm(age ~ poly(length,2,raw=TRUE))
#Gridpoints
Length=seq(0,2.5,length=100)
Age=predict(growth,list(length=grid_x))
#New predictors
lengths=c(0.72,1.82,0.41,0.28)
preds=predict(growth,list(length=lengths))
preds
1 2 3 4
5.642787 9.305444 4.204913 3.548846
actual=c(5.88,9.08,4.56,3.85)
plot(Age,Length,type="n")
lines(Age,col="red")
points(preds,lengths,pch=23,col="purple")
points(actual,pch=16,col="blue")