问题描述
受体 | 年 | 月 | 天 | 小时 | hour.inc | lat | lon | 高度 | 压力 | 日期 |
---|---|---|---|---|---|---|---|---|---|---|
1 | 2018 | 1 | 3 | 19 | 0 | 31.768 | -106.501 | 500.0 | 835.6 | 2018-01-03 19:00:00 |
1 | 2018 | 1 | 3 | 18 | -1 | 31.628 | -106.350 | 508.8 | 840.5 | 2018-01-03 18:00:00 |
1 | 2018 | 1 | 3 | 17 | -2 | 31.489 | -106.180 | 526.2 | 839.4 | 2018-01-03 17:00:00 |
1 | 2018 | 1 | 3 | 16 | -3 | 31.372 | -105.974 | 547.6 | 836.8 | 2018-01-03 16:00:00 |
1 | 2018 | 1 | 3 | 15 | -4 | 31.289 | -105.731 | 555.3 | 829.8 | 2018-01-03 15:00:00 |
1 | 2018 | 1 | 3 | 14 | -5 | 31.265 | -105.462 | 577.8 | 812.8 | 2018-01-03 14:00:00 |
1 | 2018 | 1 | 3 | 13 | -6 | 31.337 | -105.175 | 640.0 | 793.9 | 2018-01-03 13:00:00 |
1 | 2018 | 1 | 3 | 12 | -7 | 31.492 | -104.897 | 645.6 | 809.2 | 2018-01-03 12:00:00 |
1 | 2018 | 1 | 3 | 11 | -8 | 31.671 | -104.700 | 686.8 | 801.0 | 2018-01-03 11:00:00 |
1 | 2018 | 1 | 3 | 10 | -9 | 31.913 | -104.552 | 794.2 | 795.8 | 2018-01-03 10:00:00 |
2 | 2018 | 1 | 4 | 19 | 0 | 31.768 | -106.501 | 500.0 | 830.9 | 2018-01-04 19:00:00 |
2 | 2018 | 1 | 4 | 18 | -1 | 31.904 | -106.635 | 611.5 | 819.5 | 2018-01-04 18:00:00 |
2 | 2018 | 1 | 4 | 17 | -2 | 32.070 | -106.749 | 709.7 | 808.0 | 2018-01-04 17:00:00 |
2 | 2018 | 1 | 4 | 16 | -3 | 32.223 | -106.855 | 787.3 | 794.9 | 2018-01-04 16:00:00 |
上面是我的数据框的样子,但我正在尝试创建一个名为 date1 的新列,它看起来像下面的框架。
receptor year month day hour hour.inc lat lon height pressure date date1
1 1 2018 1 3 19 0 31.768 -106.501 500.0 835.6 2018-01-03 19:00:00 2018-01-03 19:00:00
2 1 2018 1 3 18 -1 31.628 -106.350 508.8 840.5 2018-01-03 18:00:00 2018-01-03 19:00:00
3 1 2018 1 3 17 -2 31.489 -106.180 526.2 839.4 2018-01-03 17:00:00 2018-01-03 19:00:00
4 1 2018 1 3 16 -3 31.372 -105.974 547.6 836.8 2018-01-03 16:00:00 2018-01-03 19:00:00
5 1 2018 1 3 15 -4 31.289 -105.731 555.3 829.8 2018-01-03 15:00:00 2018-01-03 19:00:00
6 1 2018 1 3 14 -5 31.265 -105.462 577.8 812.8 2018-01-03 14:00:00 2018-01-03 19:00:00
7 1 2018 1 3 13 -6 31.337 -105.175 640.0 793.9 2018-01-03 13:00:00 2018-01-03 19:00:00
8 1 2018 1 3 12 -7 31.492 -104.897 645.6 809.2 2018-01-03 12:00:00 2018-01-03 19:00:00
9 1 2018 1 3 11 -8 31.671 -104.700 686.8 801.0 2018-01-03 11:00:00 2018-01-03 19:00:00
10 1 2018 1 3 10 -9 31.913 -104.552 794.2 795.8 2018-01-03 10:00:00 2018-01-03 19:00:00
38 2 2018 1 4 19 0 31.768 -106.501 500.0 830.9 2018-01-04 19:00:00 2018-01-04 19:00:00
39 2 2018 1 4 18 -1 31.904 -106.635 611.5 819.5 2018-01-04 18:00:00 2018-01-04 19:00:00
40 2 2018 1 4 17 -2 32.070 -106.749 709.7 808.0 2018-01-04 17:00:00 2018-01-04 19:00:00
41 2 2018 1 4 16 -3 32.223 -106.855 787.3 794.9 2018-01-04 16:00:00 2018-01-04 19:00:00
忽略最左边的索引。我想将受体(例如:1,2)与第一次出现的日期(例如:2018-01-03 19:00:00,2018-01-04 19:00:00)匹配,然后重复直到受体变化。
我在 R 中工作,所以我想在 R 中找到解决方案,但我也可以使用 python 解决方案并利用 R 中的 Reticulate 包。
解决方法
使用 data.table
你可以试试
library(data.table)
setDT(df) #converting into data.frame
df[,date1 := date[1],receptor] # taking the first date per receptor
df
#Output
receptor year month day hour hour.inc lat lon height pressure date date1
1: 1 2018 1 3 19 0 31.768 -106.501 500.0 835.6 2018-01-03 19:00:00 2018-01-03 19:00:00
2: 1 2018 1 3 18 -1 31.628 -106.350 508.8 840.5 2018-01-03 18:00:00 2018-01-03 19:00:00
3: 1 2018 1 3 17 -2 31.489 -106.180 526.2 839.4 2018-01-03 17:00:00 2018-01-03 19:00:00
4: 1 2018 1 3 16 -3 31.372 -105.974 547.6 836.8 2018-01-03 16:00:00 2018-01-03 19:00:00
5: 1 2018 1 3 15 -4 31.289 -105.731 555.3 829.8 2018-01-03 15:00:00 2018-01-03 19:00:00
6: 1 2018 1 3 14 -5 31.265 -105.462 577.8 812.8 2018-01-03 14:00:00 2018-01-03 19:00:00
7: 1 2018 1 3 13 -6 31.337 -105.175 640.0 793.9 2018-01-03 13:00:00 2018-01-03 19:00:00
8: 1 2018 1 3 12 -7 31.492 -104.897 645.6 809.2 2018-01-03 12:00:00 2018-01-03 19:00:00
9: 1 2018 1 3 11 -8 31.671 -104.700 686.8 801.0 2018-01-03 11:00:00 2018-01-03 19:00:00
10: 1 2018 1 3 10 -9 31.913 -104.552 794.2 795.8 2018-01-03 10:00:00 2018-01-03 19:00:00
11: 2 2018 1 4 19 0 31.768 -106.501 500.0 830.9 2018-01-04 19:00:00 2018-01-04 19:00:00
12: 2 2018 1 4 18 -1 31.904 -106.635 611.5 819.5 2018-01-04 18:00:00 2018-01-04 19:00:00
13: 2 2018 1 4 17 -2 32.070 -106.749 709.7 808.0 2018-01-04 17:00:00 2018-01-04 19:00:00
14: 2 2018 1 4 16 -3 32.223 -106.855 787.3 794.9 2018-01-04 16:00:00 2018-01-04 19:00:00
,
尝试使用 np.nan
填充未更改值的位置,并使用 date
(该索引的)填充更改值的位置,然后使用 .ffill()
简单地进行前向填充
df.receptor.shift().ne(df.receptor)
将为您提供受体值变化的位置。比较前一个值和当前值以查看变化。
df['date1'] = np.where(df.receptor.shift().ne(df.receptor),df.date,np.nan)
df.date1 = df.date1.ffill()
受体 | 年 | 月 | 天 | 小时 | hour.inc | lat | lon | 高度 | 压力 | 日期 | date1 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2018 | 1 | 3 | 19 | 0 | 31.768 | -106.501 | 500.0 | 835.6 | 2018-01-03 19:00:00 | 2018-01-03 19:00:00 |
1 | 1 | 2018 | 1 | 3 | 18 | -1 | 31.628 | -106.350 | 508.8 | 840.5 | 2018-01-03 18:00:00 | 2018-01-03 19:00:00 |
2 | 1 | 2018 | 1 | 3 | 17 | -2 | 31.489 | -106.180 | 526.2 | 839.4 | 2018-01-03 17:00:00 | 2018-01-03 19:00:00 |
3 | 1 | 2018 | 1 | 3 | 16 | -3 | 31.372 | -105.974 | 547.6 | 836.8 | 2018-01-03 16:00:00 | 2018-01-03 19:00:00 |
4 | 1 | 2018 | 1 | 3 | 15 | -4 | 31.289 | -105.731 | 555.3 | 829.8 | 2018-01-03 15:00:00 | 2018-01-03 19:00:00 |
5 | 1 | 2018 | 1 | 3 | 14 | -5 | 31.265 | -105.462 | 577.8 | 812.8 | 2018-01-03 14:00:00 | 2018-01-03 19:00:00 |
6 | 1 | 2018 | 1 | 3 | 13 | -6 | 31.337 | -105.175 | 640.0 | 793.9 | 2018-01-03 13:00:00 | 2018-01-03 19:00:00 |
7 | 1 | 2018 | 1 | 3 | 12 | -7 | 31.492 | -104.897 | 645.6 | 809.2 | 2018-01-03 12:00:00 | 2018-01-03 19:00:00 |
8 | 1 | 2018 | 1 | 3 | 11 | -8 | 31.671 | -104.700 | 686.8 | 801.0 | 2018-01-03 11:00:00 | 2018-01-03 19:00:00 |
9 | 1 | 2018 | 1 | 3 | 10 | -9 | 31.913 | -104.552 | 794.2 | 795.8 | 2018-01-03 10:00:00 | 2018-01-03 19:00:00 |
10 | 2 | 2018 | 1 | 4 | 19 | 0 | 31.768 | -106.501 | 500.0 | 830.9 | 2018-01-04 19:00:00 | 2018-01-04 19:00:00 |
11 | 2 | 2018 | 1 | 4 | 18 | -1 | 31.904 | -106.635 | 611.5 | 819.5 | 2018-01-04 18:00:00 | 2018-01-04 19:00:00 |
12 | 2 | 2018 | 1 | 4 | 17 | -2 | 32.070 | -106.749 | 709.7 | 808.0 | 2018-01-04 17:00:00 | 2018-01-04 19:00:00 |
13 | 2 | 2018 | 1 | 4 | 16 | -3 | 32.223 | -106.855 | 787.3 | 794.9 | 2018-01-04 16:00:00 | 2018-01-04 19:00:00 |
在计算 ave
列以使用 Date
返回每个日期分组的第一个日期时间后考虑基本 R 的 head
:
df <- within(df,{
date_short <- as.Date(substr(as.character(date),1,10),origin="1970-01-01")
first_dt_hour <- ave(date,date_short,FUN=function(x) head(x,1))
rm(date_short) # DROP HELPER COLUMN
})
print(df)
# receptor year month day hour hour.inc lat lon height pressure date first_dt_hour
# 1 1 2018 1 3 19 0 31.768 -106.501 500.0 835.6 2018-01-03 19:00:00 2018-01-03 19:00:00
# 2 1 2018 1 3 18 -1 31.628 -106.350 508.8 840.5 2018-01-03 18:00:00 2018-01-03 19:00:00
# 3 1 2018 1 3 17 -2 31.489 -106.180 526.2 839.4 2018-01-03 17:00:00 2018-01-03 19:00:00
# 4 1 2018 1 3 16 -3 31.372 -105.974 547.6 836.8 2018-01-03 16:00:00 2018-01-03 19:00:00
# 5 1 2018 1 3 15 -4 31.289 -105.731 555.3 829.8 2018-01-03 15:00:00 2018-01-03 19:00:00
# 6 1 2018 1 3 14 -5 31.265 -105.462 577.8 812.8 2018-01-03 14:00:00 2018-01-03 19:00:00
# 7 1 2018 1 3 13 -6 31.337 -105.175 640.0 793.9 2018-01-03 13:00:00 2018-01-03 19:00:00
# 8 1 2018 1 3 12 -7 31.492 -104.897 645.6 809.2 2018-01-03 12:00:00 2018-01-03 19:00:00
# 9 1 2018 1 3 11 -8 31.671 -104.700 686.8 801.0 2018-01-03 11:00:00 2018-01-03 19:00:00
# 10 1 2018 1 3 10 -9 31.913 -104.552 794.2 795.8 2018-01-03 10:00:00 2018-01-03 19:00:00
# 38 2 2018 1 4 19 0 31.768 -106.501 500.0 830.9 2018-01-04 19:00:00 2018-01-04 19:00:00
# 39 2 2018 1 4 18 -1 31.904 -106.635 611.5 819.5 2018-01-04 18:00:00 2018-01-04 19:00:00
# 40 2 2018 1 4 17 -2 32.070 -106.749 709.7 808.0 2018-01-04 17:00:00 2018-01-04 19:00:00
# 41 2 2018 1 4 16 -3 32.223 -106.855 787.3 794.9 2018-01-04 16:00:00 2018-01-04 19:00:00
数据
data <- ' receptor year month day hour hour.inc lat lon height pressure date
1 1 2018 1 3 19 0 31.768 -106.501 500.0 835.6 "2018-01-03 19:00:00"
2 1 2018 1 3 18 -1 31.628 -106.350 508.8 840.5 "2018-01-03 18:00:00"
3 1 2018 1 3 17 -2 31.489 -106.180 526.2 839.4 "2018-01-03 17:00:00"
4 1 2018 1 3 16 -3 31.372 -105.974 547.6 836.8 "2018-01-03 16:00:00"
5 1 2018 1 3 15 -4 31.289 -105.731 555.3 829.8 "2018-01-03 15:00:00"
6 1 2018 1 3 14 -5 31.265 -105.462 577.8 812.8 "2018-01-03 14:00:00"
7 1 2018 1 3 13 -6 31.337 -105.175 640.0 793.9 "2018-01-03 13:00:00"
8 1 2018 1 3 12 -7 31.492 -104.897 645.6 809.2 "2018-01-03 12:00:00"
9 1 2018 1 3 11 -8 31.671 -104.700 686.8 801.0 "2018-01-03 11:00:00"
10 1 2018 1 3 10 -9 31.913 -104.552 794.2 795.8 "2018-01-03 10:00:00"
38 2 2018 1 4 19 0 31.768 -106.501 500.0 830.9 "2018-01-04 19:00:00"
39 2 2018 1 4 18 -1 31.904 -106.635 611.5 819.5 "2018-01-04 18:00:00"
40 2 2018 1 4 17 -2 32.070 -106.749 709.7 808.0 "2018-01-04 17:00:00"
41 2 2018 1 4 16 -3 32.223 -106.855 787.3 794.9 "2018-01-04 16:00:00"'
df <- read.table(text=data,colClasses=c(rep("integer",7),rep("numeric",4),"POSIXct"),header=TRUE)