问题描述
我在 Jupyter Lab 中运行 pyspark 时遇到问题。
Pyspark 在 jupyter 实验室上挂起运行,模型超过 200 个变量......(也许,因为计算量,我真的不知道)。 屏幕显示...只是 [*] .... 没有任何反应! 但是代码在控制台中使用 spark-submit 命令运行良好。
在通过 pyspark 或 scala 运行时在 jupyter(笔记本或实验室)中运行时刚刚发生了挂起!
有没有人有想法或遇到这些事件?
问候,
from pyspark.ml import Pipeline
pipeline = Pipeline(stages = stages)
pipelineModel = pipeline.fit(df)
df = pipelineModel.transform(df)
# selectedCols = ['label','features'] + cols
selectedCols = ['label','features']
df = df.select(selectedCols)
df.printSchema()
root
|-- label: double (nullable = false)
|-- features: vector (nullable = true)
df.show(5,truncate = False)
+-----+----------------------------------------------------------------------------------------------+
|label|features |
+-----+----------------------------------------------------------------------------------------------+
|0.0 |(218,[0,47,212,213,214,215,216,217],[1.0,1.0,300.0,0.0547,5.6343124E7,1.5024833E7,179.0]) |
|0.0 |(218,[1,29,240.0,0.1122,2.66564E8,8.8854667E7,155.0]) |
|1.0 |(218,13,84.0,0.134,4.3268104E7,1.2980431E7,139.0]) |
|0.0 |(218,342.0,0.05474,3.3063426E8,6.6126852E7,132.0])|
|1.0 |(218,112,119.0,0.1324,1685000.0,898667.0,119.0]) |
+-----+----------------------------------------------------------------------------------------------+
from pyspark.ml.classification import LogisticRegression
lr = LogisticRegression(featuresCol = 'features',labelCol = 'label',maxIter=10)
lrModel = lr.fit(train)
print("Coefficients of Logistic Regression: \n" + str(lrModel.coefficientMatrix))
print("Intercept of Logistic Regression:: " + str(lrModel.interceptVector))
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)