如何使用R DBI和bigrquery将虚拟BQ表写回到BQ?

问题描述

我希望能够

  1. 访问BQ表。这是课程
[1] "tbl_BigQueryConnection" "tbl_dbi"                "tbl_sql"               
[4] "tbl_lazy"               "tbl"   `
  1. 使用dbplyr更改表以创建新表。再次上课
[1] "tbl_BigQueryConnection" "tbl_dbi"                "tbl_sql"               
[4] "tbl_lazy"               "tbl"   
  1. 将此新表写入BQ。

我收到以下错误

错误函数(类,fdef,mtable)): 无法为签名“ BigQueryConnection”,“ character”,“ tbl_BigQueryConnection”’找到函数“ dbWriteTable”的继承方法

MRE

library(DBI)
library(dplyr,warn.conflicts = FALSE)
library(bigrquery)

############  CREATE BQ TABLE TO ACCESS  #################
dataset = bq_dataset(bq_test_project(),"test_dataset")

if (bq_dataset_exists(dataset))
{
  bq_dataset_delete(dataset,delete_contents = T)
}
#> Suitable tokens found in the cache,associated with these emails:
#>   * ariel.balter@gmail.com
#>   * ariel.balter@providence.org
#> The first will be used.
#> Using an auto-discovered,cached token.
#> To suppress this message,modify your code or options to clearly consent to the use of a cached token.
#> See gargle's "Non-interactive auth" vignette for more details:
#> https://gargle.r-lib.org/articles/non-interactive-auth.html
#> The bigrquery package is using a cached token for ariel.balter@gmail.com.

bq_dataset_create(dataset)
#> <bq_dataset> elite-magpie-257717.test_dataset

conn = DBI::dbConnect(
  bigrquery::bigquery(),project = bq_test_project(),dataset = "test_dataset",KeyFilePath = "google_service_key.json",OAuthMechanism = 0
)


if (dbExistsTable(conn,"mtcars"))
{
  dbRemoveTable(conn,"mtcars")
}

dbWriteTable(conn,"mtcars",mtcars)

#######################################################


### Access BQ table
mtcars_tbl = tbl(conn,"mtcars")
class(mtcars_tbl)
#> [1] "tbl_BigQueryConnection" "tbl_dbi"                "tbl_sql"               
#> [4] "tbl_lazy"               "tbl"

### Create new virtual table
hp_gt_100_tbl = mtcars_tbl %>% filter(hp>100)
class(hp_gt_100_tbl)
#> [1] "tbl_BigQueryConnection" "tbl_dbi"                "tbl_sql"               
#> [4] "tbl_lazy"               "tbl"

### Write new table
dbWriteTable(conn,"hp_gt_100",hp_gt_100_tbl)
#> Error in (function (classes,fdef,mtable) : unable to find an inherited method for function 'dbWriteTable' for signature '"BigQueryConnection","character","tbl_BigQueryConnection"'

dbExecute(conn,"DROP TABLE mtcars")
#> [1] 0
dbExecute(conn,"DROP TABLE hp_gt_100")
#> Error: Job 'elite-magpie-257717.job_O8e7BtdfAnAb_8Vdtwybibgd7DpA.US' Failed
#> x Not found: Table elite-magpie-257717:test_dataset.hp_gt_100 [notFound]

reprex package(v0.3.0)于2020-11-11创建

解决方法

我认为您无法使用当前的方法通过dbWriteTable来做到这一点。 dbWriteTable“将[本地]数据帧写入,覆盖或附加到数据库表中”(source)。

因此,一种选择是将数据收集到R中,然后他们使用dbWriteTable将其写回到SQL中。但这可能效率很低。

我推荐的方法是创建一个bigquery INSERT INTO语句,并将其传递给dbExecute。类似于以下内容:

sql_query <- glue::glue("INSERT INTO {db}.{schema}.{tbl_name}\n",dbplyr::sql_render(input_tbl))

result <- dbExecute(db_connection,as.character(sql_query))

sql_render将采用当前虚拟表的定义并返回查询文本。 dbExecute会将此命令传递给bigquery服务器以执行。

请注意,我对bigquery的INSERT INTO语法不够熟悉,无法确保上面的sql_query的语法正确,但是我知道一般方法可以正常工作,因为我在dbplyr和DBI中广泛使用SQL服务器。

,

我接受Simon S.A.的回答。但是,我确实设法使用bigrquery函数bq_project_query找到了更直接的方法。

library(DBI)
library(dplyr,warn.conflicts = FALSE)
library(bigrquery)

bq_deauth()
bq_auth(email="ariel.balter@gmail.com")


############  CREATE BQ TABLE TO ACCESS  #################

dataset = bq_dataset("elite-magpie-257717","test_dataset")

if (bq_dataset_exists(dataset))
{
  bq_dataset_delete(dataset,delete_contents = T)
}
bq_dataset_create(dataset)
#> <bq_dataset> elite-magpie-257717.test_dataset

conn = dbConnect(
  bigrquery::bigquery(),project = "elite-magpie-257717",dataset = "test_dataset",KeyFilePath = "google_service_key.json",OAuthMechanism = 0
)

dbWriteTable(conn,"mtcars",mtcars,overwrite=T)

dbListTables(conn)
#> [1] "mtcars"

#######################################################


### Access BQ table
mtcars_tbl = tbl(conn,"test_dataset.mtcars")
class(mtcars_tbl)
#> [1] "tbl_BigQueryConnection" "tbl_dbi"                "tbl_sql"               
#> [4] "tbl_lazy"               "tbl"

### Create new virtual table
hp_gt00_tbl = mtcars_tbl %>% filter(hp>100)
class(hp_gt00_tbl)
#> [1] "tbl_BigQueryConnection" "tbl_dbi"                "tbl_sql"               
#> [4] "tbl_lazy"               "tbl"

hp_gt00_tbl %>% dbplyr::sql_render()
#> <SQL> SELECT *
#> FROM `test_dataset.mtcars`
#> WHERE (`hp` > 100.0)

bq_project_query(
  x = dataset$project,query = hp_gt00_tbl %>% dbplyr::sql_render(),destination = bq_table(dataset,"hp_gt_00")
)
#> <bq_table> elite-magpie-257717.test_dataset.hp_gt_00

bq_dataset_tables(dataset)
#> [[1]]
#> <bq_table> elite-magpie-257717.test_dataset.hp_gt_00
#> 
#> [[2]]
#> <bq_table> elite-magpie-257717.test_dataset.mtcars

bq_dataset_delete(dataset,delete_contents = T)

reprex package(v0.3.0)于2020-11-15创建