问题描述
我从以下“地下天气”网站复制并粘贴了天气信息,以进行一些数据分析,数据如下所示:
https://www.wunderground.com/dashboard/pws/KCACHINO13/table/2018-04-10/2018-04-10/daily
如您所见,温度和其他信息均带有文字,因此我无法进行任何计算。在excel中,我使用了replace(xx,“ F”,“”)从“ Temperature”列中删除了F,但是后来我想使用convert(xx,“ F”,“ C”)将Farenheit转换为Celcius ,我无法获得结果。我认为数据本身存在问题。我将单元格格式化为数字,或者将值复制并粘贴到新列中,但是它们都不起作用。
然后我将data.frame导入R,并尝试使用R进行一些数据格式化。我检查了Temperature列的类别,该列显示“字符”:
class(a$Temperature)
#"character"
a$Temperature <- gsub("F","",a$Temperature)
# this command remmoved "F"
as.numeric(a$Temperature)
#Warning message: NAs introduced by coercion
as.numeric(unlist(a$Temperature))
#still the same warning message
从excel中,我创建了一个新列,该列从温度中删除了“ F”,并在R中使用此列将“字符”转换为“数字”,但仍然收到警告消息。我不知道该如何处理。有人可以帮我吗?谢谢!
按照下面的建议,我要粘贴输出
dput(head(a))
#structure(list(Time = structure(c(-2209075140,-2209074840,-2209074540,-2209074240,-2209073940,-2209073640),tzone = "UTC",class = c("POSIXct","POSIXt")),Temperature = c("60.0 ","59.9 ","59.8 ","59.7 ","59.6 ","59.5 "),`T(F)` = c("60.0 ",`Dew Point` = c("48.2 F","48.1 F","48.4 F","48.3 F","48.2 F","48.1 F"),Humidity = c("65 %","65 %","66 %","66 %"),Wind = c("WSW","WSW","WSW"),Speed = c("0.0 mph","0.0 mph","0.0 mph"),Gust = c("0.0 mph",Pressure = c("29.88 in","29.88 in","29.88 in"),`Precip. Rate.` = c("0.00 in","0.00 in","0.00 in"),`Precip. Accum.` = c("0.00 in",UV = c(0,0),Solar = c("0 w/m²","0 w/m²","0 w/m²")),row.names = c(NA,-6L),class = c("tbl_df","tbl","data.frame"))
解决方法
如果只想转换“温度”列,则可以考虑以下选项。
数据
df <- structure(list(Time = c("12:04 AM","12:09 AM","12:14 AM","12:19 AM","12:24 AM","12:29 AM"),Temperature = c("69.4 F","69.2 F","68.8 F","68.5 F","68.3 F","68.0 F"),Dew.Point = c("45.9 F","46.0 F","45.8 F","45.7 F","45.7 F"),Humidity = c("43 %","43 %","44 %","45 %"),Wind = c("NE","NE","NE"),Speed = c("0.0 mph","0.0 mph","0.0 mph"),Gust = c("0.0 mph",Pressure = c("29.93 in","29.94 in","29.95 in","29.95 in"),Precip..Rate. = c("0.00 in","0.00 in","0.00 in"),Precip..Accum. = c("0.00 in",UV = c(0L,0L,0L),Solar = c("0 w/m²","0 w/m²","0 w/m²")),class = "data.frame",row.names = c(NA,-6L))
代码
library(dplyr)
library(stringr)
df2 <- df %>%
mutate(Temperature2 = as.numeric(str_extract(Temperature,"[\\d\\.]+"))) %>%
relocate(Temperature2,.after = Temperature)
df2[,2:3]
# Temperature Temperature2
# 1 69.4 F 69.4
# 2 69.2 F 69.2
# 3 68.8 F 68.8
# 4 68.5 F 68.5
# 5 68.3 F 68.3
# 6 68.0 F 68.0
str(df2$Temperature2)
# num [1:6] 69.4 69.2 68.8 68.5 68.3 68
,
也许这有帮助。 在此函数中,嵌套了几个不同的函数,例如从字符变量更改为数字。还有gsub,它将逗号更改为空白。您应该只更改逗号以表示正在更改的内容。从未尝试过将其与字母一起使用,但这可能是一种解决方案。这是代码:
data666
应用函数将函数应用于整个数据集。 2表示它逐列执行。如果要逐行更改它,则必须将2更改为1。