pyodbc 错误 - ('ODBC SQL 类型 -151 尚不支持column-index=16 type=-151', 'HY106')

问题描述

我正在使用 python 和 pyodbc 自动化一些查询提取,然后转换为 parquet 格式,并发送到 AWS S3。

到目前为止,我的脚本解决方案运行良好,但我遇到了一个问题。我有一个 Schema,我们称之为 SCHEMA_A,其中有几个表,TABLE_1、TABLE_2 .... TABLE_N。

可以使用相同的凭据访问该架构中的所有表。

所以我正在使用这样的脚本来自动执行任务。

def get_stream(cursor,batch_size=100000):
    while True:
        row = cursor.fetchmany(batch_size)
        if row is None or not row:
            break
        yield row


cnxn = pyodbc.connect(driver='pyodbc driver here',host='host name',database='schema name',user='user name,password='password')
print('Connection stabilished ...')

cursor = cnxn.cursor()
print('Initializing cursos ...')

if len(sys.argv) > 1:
    table_name = sys.argv[1]
    cursor.execute('SELECT * FROM {}'.format(table_name))
else:
    exit()
print('Query fetched ...')

row_batch = get_stream(cursor)
print('Getting Iterator ...')

cols = cursor.description
cols = [col[0] for col in cols]

print('Initalizin batch data frame ..')
df = pd.DataFrame(columns=cols)

start_time = time.time()
for rows in row_batch:
    tmp = pd.DataFrame.from_records(rows,columns=cols)
    df = df.append(tmp,ignore_index=True)
    tmp = None
    print("--- Batch inserted inn%s seconds ---" % (time.time() - start_time))
    start_time = time.time()

我在 Airflow 任务中运行了一个类似的代码,并且对所有其他表都很好。但是后来我有两个表,让我们调用 TABLE_I 和 TABLE_II,当我执行 cursor.fetchmany(batch_size) 时会产生以下错误

ERROR - ('ODBC sql type -151 is not yet supported.  column-index=16  type=-151','HY106')
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",line 1112,in _run_raw_task
    self._prepare_and_execute_task_with_callbacks(context,task)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",line 1285,in _prepare_and_execute_task_with_callbacks
    result = self._execute_task(context,task_copy)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",line 1310,in _execute_task
    result = task_copy.execute(context=context)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/airflow/operators/python.py",line 117,in execute
    return_value = self.execute_callable()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/airflow/operators/python.py",line 128,in execute_callable
    return self.python_callable(*self.op_args,**self.op_kwargs)
  File "/home/ubuntu/prea-ninja-airflow/jobs/plugins/extract/fetch.py",line 58,in fetch_data
    for rows in row_batch:
  File "/home/ubuntu/prea-ninja-airflow/jobs/plugins/extract/fetch.py",line 27,in stream
    row = cursor.fetchmany(batch_size)

使用 sqlElectron 检查这些表,并查询前几行,当我使用 sql 服务器语言查找该列的数据类型时,我意识到 TABLE_I 和 TABLE_II 都有一个名为“Geolocalizacao”的列:>

SELECT DATA_TYPE 
FROM @R_628_4045@ION_SCHEMA.COLUMNS
WHERE 
     TABLE_NAME = 'TABLE_I' AND 
     COLUMN_NAME = 'Geolocalizacao';

它产生:

DATA_TYPE
geography

在堆栈溢出时在这里搜索我找到了这个解决方案:python pyodbc SQL Server Native Client 11.0 cannot return geometry column

根据用户的描述,添加以下内容似乎可以正常工作:

def unpack_geometry(raw_bytes):
    # adapted from SSCLRT @R_628_4045@ion at
    #   https://docs.microsoft.com/en-us/openspecs/sql_server_protocols/ms-ssclrt/dc988cb6-4812-4ec6-91cd-cce329f6ecda
    tup = struct.unpack('<i2b3d',raw_bytes)
    # tup contains: (unkNown,Version,Serialization_Properties,X,Y,SRID)
    return tup[3],tup[4],tup[5]

然后:

cnxn.add_output_converter(-151,unpack_geometry)

创建连接后。但它不适用于地理数据类型,当我使用此代码(在 python 脚本上添加 import struct)时,它给了我以下错误

Traceback (most recent call last):
  File "benchmark.py",line 79,in <module>
    for rows in row_batch:
  File "benchmark.py",line 39,in get_stream
    row = cursor.fetchmany(batch_size)
  File "benchmark.py",line 47,in unpack_geometry
    tup = struct.unpack('<i2b3d',raw_bytes)
struct.error: unpack requires a buffer of 30 bytes

此列具有的值的示例,遵循给定的模板:

{"srid":4326,"version":1,"points":[{}],"figures":[{"attribute":1,"pointOffset":0}],"shapes":[{"parentOffset":-1,"figureOffset":0,"type":1}],"segments":[]}

老实说,我不知道如何为这个给定的结构调整代码,有人可以帮助我吗?它对所有其他表都运行良好,但我有这两个表与此列让我很头疼。

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)