从Azure Eventhub读取Spark => StreamingQueryException:输入字节数组的4字节结尾单元有错误

问题描述

我正在尝试使用Spark / Python收集Azure Eventhub消息。 每次,我都会收到异常“ StreamingQueryException:输入字节数组的4字节结尾单元有错误

请问有什么想法吗?

conf = {}
conf["eventhubs.connectionString"] = "Endpoint=sb://XXXXXXXXX.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=XXXXXXXXXXXXX=;EntityPath=XXXXXX"
                                      
read_df  = spark.readStream.format("eventhubs").options(**conf).load()
stream = read_df.writeStream.format("console").start()
stream.awaitTermination()

解决方法

请注意,对于版本2.3.15及更高版本,您需要在配置字典中加密连接字符串:

ehConf['eventhubs.connectionString'] = sc._jvm.org.apache.spark.eventhubs.EventHubsUtils.encrypt(connectionString)

https://github.com/Azure/azure-event-hubs-spark/blob/master/docs/PySpark/structured-streaming-pyspark.md#event-hubs-configuration