问题描述
全部。我想使用Python连接到HDInsight中的Hive数据库,我关注了多个博客,也很少关注Stackoverflow blogs。但是没有运气。以下是我使用 pyhive 和 JayDeBeApi 库的尝试。
使用JayDeBeApi
我已将hive-jdbc-1.2.1,httpclient-4.4和httpcore-4.4.4 jar添加到当前工作目录中,并且已经使用pip install thrift安装了Thrift。 代码片段是
import jaydebeapi
conn = jaydebeapi.connect("org.apache.hive.jdbc.HiveDriver","jdbc:hive2://shaktiman.database.windows.net:443/;ssl=true;transportMode=http;httpPath=/hive2",['admin','Abcdeertyoiu@1234'],"hive-jdbc-1.2.1.jar")
cursor = conn.cursor()
cursor.execute("select * from default.hivesampletable limit 50")
print(cursor.description) # prints the result set's schema
results = cursor.fetchall()
但是我遇到了以下错误:
Traceback (most recent call last):
File "ClassLoader.java",line 357,in java.lang.classLoader.loadClass
File "Launcher.java",line 349,in sun.misc.Launcher$AppClassLoader.loadClass
File "ClassLoader.java",line 424,in java.lang.classLoader.loadClass
File "urlclassloader.java",line 382,in java.net.urlclassloader.findClass
java.lang.classNotFoundException: java.lang.classNotFoundException: org.apache.hive.service.cli.thrift.TCLIService$Iface
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "org.jpype.JPypeContext.java",line 330,in org.jpype.JPypeContext.callMethod
File "Method.java",line 498,in java.lang.reflect.Method.invoke
File "DelegatingMethodAccessorImpl.java",line 43,in sun.reflect.DelegatingMethodAccessorImpl.invoke
File "NativeMethodAccessorImpl.java",line 62,in sun.reflect.NativeMethodAccessorImpl.invoke
File "NativeMethodAccessorImpl.java",line -2,in sun.reflect.NativeMethodAccessorImpl.invoke0
File "DriverManager.java",line 247,in java.sql.DriverManager.getConnection
File "DriverManager.java",line 664,in java.sql.DriverManager.getConnection
File "HiveDriver.java",line 105,in org.apache.hive.jdbc.HiveDriver.connect
Exception: Java Exception
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:/Learning Dir/PycharmProjects/Python/HdInsight-Hive/test.py",line 39,in <module>
"hive-jdbc-1.2.1.jar")
File "D:\Learning Dir\PycharmProjects\Python\venv\lib\site-packages\jaydebeapi\__init__.py",line 412,in connect
jconn = _jdbc_connect(jclassname,url,driver_args,jars,libs)
File "D:\Learning Dir\PycharmProjects\Python\venv\lib\site-packages\jaydebeapi\__init__.py",line 230,in _jdbc_connect_jpype
return jpype.java.sql.DriverManager.getConnection(url,*dargs)
java.lang.NoClassDefFoundError: java.lang.NoClassDefFoundError: org/apache/hive/service/cli/thrift/TCLIService$Iface
不确定,这是什么问题。
我也尝试过使用PyHive,如下所示
from pyhive import hive
conn = hive.connect('hn0-shaktiman-po.ttl4q3khoz5uvb1d4jopix3kbg.cx.internal.cloudapp.net',port=10000,auth='NOSASL')
cursor = conn.cursor()
cursor.execute('SHOW DATABASES')
cursor.fetchall()
但是我仍然有问题:
"D:\Learning Dir\PycharmProjects\Python\venv\Scripts\python.exe" "D:/Learning Dir/PycharmProjects/Python/HdInsight-Hive/test2.py"
Failed to resolve sockaddr for hn0-shaktiman-po.ttl4q3khoz5uvb1d4jopix3kbg.cx.internal.cloudapp.net:10000
Traceback (most recent call last):
File "D:\Learning Dir\PycharmProjects\Python\venv\lib\site-packages\thrift\transport\TSocket.py",line 99,in open
addrs = self._resolveAddr()
File "D:\Learning Dir\PycharmProjects\Python\venv\lib\site-packages\thrift\transport\TSocket.py",line 42,in _resolveAddr
socket.AI_PASSIVE | socket.AI_ADDRCONfig)
File "D:\Installation\Python\python38-32\lib\socket.py",line 752,in getaddrinfo
for res in _socket.getaddrinfo(host,port,family,type,proto,flags):
socket.gaierror: [Errno 11001] getaddrinfo Failed
Traceback (most recent call last):
File "D:\Learning Dir\PycharmProjects\Python\venv\lib\site-packages\thrift\transport\TSocket.py",flags):
socket.gaierror: [Errno 11001] getaddrinfo Failed
During handling of the above exception,another exception occurred:
Traceback (most recent call last):
File "D:/Learning Dir/PycharmProjects/Python/HdInsight-Hive/test2.py",line 2,in <module>
conn = hive.connect('hn0-shaktiman-po.ttl4q3khoz5uvb1d4jopix3kbg.cx.internal.cloudapp.net',auth='NOSASL')
File "D:\Learning Dir\PycharmProjects\Python\venv\lib\site-packages\pyhive\hive.py",line 94,in connect
return Connection(*args,**kwargs)
File "D:\Learning Dir\PycharmProjects\Python\venv\lib\site-packages\pyhive\hive.py",line 192,in __init__
self._transport.open()
File "D:\Learning Dir\PycharmProjects\Python\venv\lib\site-packages\thrift\transport\TTransport.py",line 155,in open
return self.__trans.open()
File "D:\Learning Dir\PycharmProjects\Python\venv\lib\site-packages\thrift\transport\TSocket.py",line 103,in open
raise TTransportException(type=TTransportException.NOT_OPEN,message=msg,inner=gai)
thrift.transport.TTransport.TTransportException: Failed to resolve sockaddr for hn0-shaktiman-po.ttl4q3khoz5uvb1d4jopix3kbg.cx.internal.cloudapp.net:10000
此外,很少有博客建议将hiveserver2传输模式从“ http”更改为“ binary”。试过了。但这对我也没有帮助...
如果有人可以提出一些可行的代码或解决方案,我将不胜感激。 提前谢谢。
解决方法
在我看来配置/网络问题。
- 您可以验证从主机(正在提交应用程序的主机)到HDI集群的连接(可以忽略是否从HDI的头节点提交)。尝试在此处使用ip地址-{{1} }。您可以通过在HDI群集中运行
hn0-shaktiman-po.ttl4q3khoz5uvb1d4jopix3kbg.cx.internal.cloudapp.net
来获取IP地址。 - 也可以尝试使用
curl ifconfig.me
检查端口在任何地方都没有使用。尝试使用10001 - 尝试在Ambari中将值
telnet
从hive.server2.transport.mode
更改为http