无法在 python 中正确读取 SQL 表：作为逗号分隔字符/元组导入的 varchar 列更改您的连接事后修复数据帧：

问题描述

我正在使用以下代码连接到 Oracle 数据库：

jar = ojdbc8.jar path
jvm_path = jvm.dll path
args = '-Djava.class.path=%s' % jar
jpype.startJVM(jvm_path,args)
con = jaydebeapi.connect("oracle.jdbc.driver.OracleDriver",url,[user,password],jar)

连接工作正常，但数据在 this odd format 中返回。

pd.read_sql("SELECT * FROM table1",con)

收益

+---+-----------------+-----------------+-----------------+
|   | (C,O,L,U,M,N,1) | (C,2) | (C,3) |
+---+-----------------+-----------------+-----------------+
| 1 | (t,e,s,t)       | (t,t,2)     | 1               |
+---+-----------------+-----------------+-----------------+
| 2 | (f,o,o)         | (b,a,r)         | 100             |
+---+-----------------+-----------------+-----------------+

正确导入数字和日期，但未正确导入 varchar 列。我尝试了不同的表，所有表都有这个问题。

我在任何地方都没有见过这样的事情。希望你能帮助我。

解决方法

将 jaydebeapi 与 jpype 一起使用时，这似乎是一个问题。在以与您相同的方式连接到 Oracle 数据库时，我可以重现这一点（在我的情况下是 Oracle 11gR2，但由于您使用的是 ojdbc8.jar，我想其他版本也会发生这种情况）。

有多种方法可以解决此问题：

更改您的连接

由于错误似乎只发生在特定的包组合中，最明智的做法是尽量避免这些，从而完全避免错误。

备选方案 1：使用不带 jaydebeapi 的 jpype：

如前所述，我只在使用 jaydebeapi 和 jpype 时观察到这一点。但是，就我而言，根本不需要 jpype。我在本地有 .jar 文件，没有它我的连接也能正常工作：

import jaydebeapi as jdba
import pandas as pd
import os

db_host = 'db.host.com'
db_port = 1521
db_sid = 'YOURSID'

jar=os.getcwd()+'/ojdbc6.jar'

conn = jdba.connect('oracle.jdbc.driver.OracleDriver','jdbc:oracle:thin:@' + db_host + ':' + str(db_port) + ':' + db_sid,{'user': 'USERNAME','password': 'PASSWORD'},jar
                )

df_jay = pd.read_sql('SELECT * FROM YOURSID.table1',conn)

conn.close()

在我的情况下，这工作正常并正常创建数据框。

备选方案 2：改用 cx_Oracle：

如果我使用 cx_Oracle 连接到 Oracle 数据库也不会出现此问题：

import cx_Oracle
import pandas as pd
import os

db_host = 'db.host.com'
db_port = 1521
db_sid = 'YOURSID'

dsn_tns = cx_Oracle.makedsn(db_host,db_port,db_sid)
cx_conn = cx_Oracle.connect('USERNAME','PASSWORD',dsn_tns)

df_cxo = pd.read_sql('SELECT * FROM YOURSID.table1',con=cx_conn)

cx_conn.close()

注意：要使 cx_Oracle 工作，您必须安装并正确设置 Oracle Instant Client（参见例如 cx_Oracle documentation for Ubuntu）。

事后修复数据帧：

如果由于某种原因，您无法使用上述连接替代方案，您也可以转换您的数据框。

备选方案 3：加入元组条目：

您可以使用 ''.join() 到 convert tuples to strings。您需要为条目和列名称执行此操作。

# for all entries that are not None,join the tuples
for col in df.select_dtypes(include=['object']).columns:
    df[col] = df[col].apply(lambda x: ''.join(x) if x is not None else x)

# also rename the column headings in the same way
df.rename(columns=lambda x: ''.join(x) if x is not None else x,inplace=True)

备选方案 4：更改列的 dtype：

通过将受影响列的 dtype 从 object 更改为 string，所有条目也将被转换。请注意，这可能会产生不需要的副作用，例如将 None 值更改为字符串 <N/A>。此外，您必须单独重命名列标题，如上所述。
```
for col in df.select_dtypes(include=['object']).columns:
    df[col] = df[col].astype('string')

# again,rename headings
df.rename(columns=lambda x: ''.join(x) if x is not None else x,inplace=True)
```

所有这些最终都应该产生或多或少相同的 df（除了 dtypes 和可能的 None 值替换）：

+---+---------+---------+---------+
|   | COLUMN1 | COLUMN2 | COLUMN3 |
+---+---------+---------+---------+
| 1 | test    | test2   | 1       |
+---+---------+---------+---------+
| 2 | foo     | bar     | 100     |
+---+---------+---------+---------+

jaydebeapi oracle oracle pandas pandas python sql sql

无法在 python 中正确读取 SQL 表：作为逗号分隔字符/元组导入的 varchar 列 更改您的连接事后修复数据帧：

问题描述

解决方法

更改您的连接

事后修复数据帧：

相关问答

无法在 python 中正确读取 SQL 表：作为逗号分隔字符/元组导入的 varchar 列更改您的连接事后修复数据帧：