问题描述
con <- DBI::dbConnect(
bigrquery::bigquery(),project = "project",dataset = "dataset"
)
table1 <- dplyr::tbl(con,"table1")
table2 <- dplyr::tbl(con,"table2")
df <- puf %>% dplyr::filter(code == "48140") %>% dplyr::left_join(.,table2,by = c("id","year")) %>% dplyr::collect()
我的大多数数据看起来像我期望的那样,但是我也得到了7,656行,看起来像这样:
id year sex race ethnicity codex code hourwks inout train age yr syr nextyr colorof goingto boolean1 boolean2 int1 int2 height weight boolean3 boolean4
<int> <dbl> <lgl> <int> <lgl> <chr> <chr> <dbl> <lgl> <int> <int> <date> <date> <date> <lgl> <int> <int> <int> <int> <int> <int> <int> <lgl> <lgl>
1 0 0 FALSE 0 FALSE "" "" 0 FALSE 0 0 1970-01-01 1970-01-01 1970-01-01 FALSE 0 0 0 0 0 0 0 FALSE FALSE
2 0 0 FALSE 0 FALSE "" "" 0 FALSE 0 0 1970-01-01 1970-01-01 1970-01-01 FALSE 0 0 0 0 0 0 0 FALSE FALSE
3 0 0 FALSE 0 FALSE "" "" 0 FALSE 0 0 1970-01-01 1970-01-01 1970-01-01 FALSE 0 0 0 0 0 0 0 FALSE FALSE
4 0 0 FALSE 0 FALSE "" "" 0 FALSE 0 0 1970-01-01 1970-01-01 1970-01-01 FALSE 0 0 0 0 0 0 0 FALSE FALSE
5 0 0 FALSE 0 FALSE "" "" 0 FALSE 0 0 1970-01-01 1970-01-01 1970-01-01 FALSE 0 0 0 0 0 0 0 FALSE FALSE
6 0 0 FALSE 0 FALSE "" "" 0 FALSE 0 0 1970-01-01 1970-01-01 1970-01-01 FALSE 0 0 0 0 0 0 0 FALSE FALSE
7 0 0 FALSE 0 FALSE "" "" 0 FALSE 0 0 1970-01-01 1970-01-01 1970-01-01 FALSE 0 0 0 0 0 0 0 FALSE FALSE
8 0 0 FALSE 0 FALSE "" "" 0 FALSE 0 0 1970-01-01 1970-01-01 1970-01-01 FALSE 0 0 0 0 0 0 0 FALSE FALSE
9 0 0 FALSE 0 FALSE "" "" 0 FALSE 0 0 1970-01-01 1970-01-01 1970-01-01 FALSE 0 0 0 0 0 0 0 FALSE FALSE
10 0 0 FALSE 0 FALSE "" "" 0 FALSE 0 0 1970-01-01 1970-01-01 1970-01-01 FALSE 0 0 0 0 0 0 0 FALSE FALSE
总行数正确-20515
当我直接在Google BigQuery中运行以下查询时,我可以证明这不仅是数据:
SELECT min(id),FROM table1
LEFT JOIN table2 using (id,year)
WHERE code in ("48140")
它返回相同的行数(20515),但是min(id)
是28309
。因此,我知道数据正在更改。我知道数据集中没有7,656个相同的记录。
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)