如何在具有两列相同的两个不同数据集中查找单个值基于条件

问题描述

我有两个不同的数据集,如下所示:

enter image description here

仅在以下情况下,我需要在数据集#2中添加通话量,放置量和总体积列:两个数据集中的ID和Date列均匹配。我根据数据集#1中第3列中的值来分离看涨期权,看跌期权和总计(C为看涨期权,P为看跌期权,T为总计)。

我正在运行此代码,但无法正常工作(仅显示看涨期权的示例,看跌价和总计的规则相同)。

dataset2$call_volume <- if(dataset1$optiontype== "C")
{ dataset1$volume [ match (
                          interaction(dataset2$ID,dataset2$date),interaction(dataset1$ID,dataset1$date)
                                         )]}

有人对我如何进行代码提出建议吗?非常感谢!

> dput(dataset1)
structure(list(ID = c(44652,44652,56266,56266),date = c("1997/01/02","1997/01/02","1997/01/03","1997/01/03"),`option type (C,P,T: for calls,puts,and total)` = c("C","P","T","C","T"),volume = c(34,250,284,30,1443,211,1654,4490,826,5316)),row.names = c(NA,-12L),class = c("tbl_df","tbl","data.frame"))

> dput(dataset2)
structure(list(ID = c(44652,56266
),"1997/01/04","1997/01/04"),`call volume` = c(NA,NA,NA),`put volume` = c(NA,`total volume` = c(NA,NA)),-6L),"data.frame"))

更新:两个数据集中我还有许多其他列,彼此之间有很大不同,唯一的共同点是下面的图片和数据集中显示的列。

解决方法

我认为这是一个x / y问题。我认为您实际上是在尝试将dataset1转换为宽格式以填充dataset2。之后,您可以left_join两个帧。

library(tidyr)
library(dplyr)

names(dataset1)[3] <- "option_type"

dataset2 %>% 
  dplyr::select(-`call volume`,-`put volume`,-`total volume`) %>%
  left_join(dataset1 %>% 
  tidyr::pivot_wider(names_from = "option_type",values_from = "volume") %>%
  rename("Call Volume" = C,"Put Volume" = P,"Total Volume" = `T`),by = c("ID","date"))
#> # A tibble: 6 x 5
#>      ID date       `Call Volume` `Put Volume` `Total Volume`
#>   <dbl> <chr>              <dbl>        <dbl>          <dbl>
#> 1 44652 1997/01/02            34          250            284
#> 2 44652 1997/01/03          1443          211           1654
#> 3 44652 1997/01/04            NA           NA             NA
#> 4 56266 1997/01/02            30            0             30
#> 5 56266 1997/01/03          4490          826           5316
#> 6 56266 1997/01/04            NA           NA             NA

reprex package(v0.3.0)于2020-10-07创建

,

如果我的理解正确,那么您希望数据集2匹配来自数据集1的值,如果不匹配则不匹配。

在这种情况下,您需要使用left_join

如果没有,请用所需的输出更新您的问题。

library(tidyverse)

d1 <- structure(list(ID = c(44652,44652,56266,56266),date = c("1997/01/02","1997/01/02","1997/01/03","1997/01/03"),`option type (C,P,T: for calls,puts,and total)` = c("C","P","T","C","T"),volume = c(34,250,284,30,1443,211,1654,4490,826,5316)),row.names = c(NA,-12L),class = c("tbl_df","tbl","data.frame"))


d2 <- structure(list(ID = c(44652,56266
),"1997/01/04","1997/01/04"),`call volume` = c(NA,NA,NA),`put volume` = c(NA,`total volume` = c(NA,NA)),-6L),"data.frame"))

d1_longer <- d1 %>%
  pivot_wider(names_from = `option type (C,and total)`,values_from = volume) %>%
  rename(`call volume` = `C`,`put volume` = `P`,`total volume` = `T`)

d2 %>%
  select(ID,date) %>%
  left_join(d1_longer)
#> Joining,"date")
#> # A tibble: 6 x 5
#>      ID date       `call volume` `put volume` `total volume`
#>   <dbl> <chr>              <dbl>        <dbl>          <dbl>
#> 1 44652 1997/01/02            34          250            284
#> 2 44652 1997/01/03          1443          211           1654
#> 3 44652 1997/01/04            NA           NA             NA
#> 4 56266 1997/01/02            30            0             30
#> 5 56266 1997/01/03          4490          826           5316
#> 6 56266 1997/01/04            NA           NA             NA

reprex package(v0.3.0)于2020-10-07创建