Pyspark 挂在 jupyter 实验室上运行，模型超过 200 个变量

问题描述

我在 Jupyter Lab 中运行 pyspark 时遇到问题。

Pyspark 在 jupyter 实验室上挂起运行，模型超过 200 个变量......（也许，因为计算量，我真的不知道）。屏幕显示...只是 [*] .... 没有任何反应！但是代码在控制台中使用 spark-submit 命令运行良好。

在通过 pyspark 或 scala 运行时在 jupyter（笔记本或实验室）中运行时刚刚发生了挂起！

有没有人有想法或遇到这些事件？

问候，

from pyspark.ml import Pipeline

pipeline = Pipeline(stages = stages)
pipelineModel = pipeline.fit(df)
df = pipelineModel.transform(df)
# selectedCols = ['label','features'] + cols
selectedCols = ['label','features']
df = df.select(selectedCols)

df.printSchema()

root
 |-- label: double (nullable = false)
 |-- features: vector (nullable = true)

df.show(5,truncate = False)

+-----+----------------------------------------------------------------------------------------------+
|label|features                                                                                      |
+-----+----------------------------------------------------------------------------------------------+
|0.0  |(218,[0,47,212,213,214,215,216,217],[1.0,1.0,300.0,0.0547,5.6343124E7,1.5024833E7,179.0]) |
|0.0  |(218,[1,29,240.0,0.1122,2.66564E8,8.8854667E7,155.0])   |
|1.0  |(218,13,84.0,0.134,4.3268104E7,1.2980431E7,139.0])           |
|0.0  |(218,342.0,0.05474,3.3063426E8,6.6126852E7,132.0])|
|1.0  |(218,112,119.0,0.1324,1685000.0,898667.0,119.0])     |
+-----+----------------------------------------------------------------------------------------------+

from pyspark.ml.classification import LogisticRegression

lr = LogisticRegression(featuresCol = 'features',labelCol = 'label',maxIter=10)
lrModel = lr.fit(train)
print("Coefficients of Logistic Regression: \n" + str(lrModel.coefficientMatrix))
print("Intercept of Logistic Regression:: " + str(lrModel.interceptVector))

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

apache-spark jupyter jupyter-kernel jupyter-lab pyspark pyspark

Pyspark 挂在 jupyter 实验室上运行，模型超过 200 个变量

问题描述

解决方法

相关问答