dbplyr 不明确的列名

问题描述

我正在使用 ODBCdbplyr 连接两个相对简单的表。但是,我的连接键出现错误,它抛出一个 ambiguous column name error。使用 dplyr 连接通常不会发生这种情况,我不知道如何像使用 a.key = b.key 一样使用 dbplyr

Error: nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][ODBC sql Server Driver][sql Server]Ambiguous column name 'Calendar_key'.  [Microsoft][ODBC sql Server Driver][sql Server]Statement(s) Could not be prepared. 
<sql> 'SELECT "Calendar_key","Organization_key","Product_Key","Promotion_Key","Shift_Key","ETL_source_system_key","Pack_Size","Qty_Sold","Inv_Unit_Qty","Extended_Cost","Extended_Purchase_Rebate","Extended_Sales_Rebate","Extended_Sales","Ent_Source_Hdr_Key","Ent_Source_Dtl_Key","Day_Date","Day_Of_Week_ID","Day_Of_Week","Holiday","Type_Of_Day","Calendar_Month_No","Calendar_Month_Name","Calendar_Qtr_No","Calendar_Qtr_Desc","Calendar_Year","Fiscal_Week","Fiscal_Period_No","Fiscal_Period_Desc","Fiscal_Year"
FROM "Item_Sales_Fact" AS "LHS"
LEFT JOIN "calendar" AS "RHS"
ON ("LHS"."Calendar_key" = "RHS"."calendar_key")

这是下面的代码块:我的连接叫做 con

con <- dbConnect(odbc(),Driver = "sql Server",Server = "192.168.139.1",Database = "pdi_warehouse_2304_01",UID = XXXX,PWD = XXXX,Port = 1433)

item.sales <- tbl(con,"Item_Sales_Fact")
calendar <- tbl(con,"calendar")
organization <- tbl(con,"Organization")

test.df <- item.sales %>%
  left_join(calendar,by = c("Calendar_key" = "calendar_key")) %>%
  collect()

解决方法

SQL 生成的 dbplyr 不正确,因为 Calendar_key 可以来自 RHSLHS,因为 SQL 不是区分大小写且与 R 不同,不区分 Calendar_keycalendar_key

SELECT "Calendar_key",...

问题似乎来自这样一个事实:虽然 SQL 不区分大小写,但 SQL Server 处理区分大小写的列名。

一种解决方法是重命名两个键之一以获得完全相同的区分大小写的名称:

item.sales <- tbl(con,"Item_Sales_Fact")
calendar <- tbl(con,"calendar") %>% rename(Calendar_key = calendar_key)

test.df <- item.sales %>%
  left_join(calendar,by = c("Calendar_key" = "Calendar_key")) %>%
  collect()