火花scala运行

嗨,我是新来的火花和斯卡拉.我在spark scala提示符下运行 scala代码.该程序很好,它显示“定义模块MLlib”,但它不在屏幕上打印任何东西.我做错了什么?有没有其他方法在scala shell中运行此程序spark并获得输出?

import org.apache.spark.{SparkConf,SparkContext}
    import org.apache.spark.mllib.classification.LogisticRegressionWithSGD
    import org.apache.spark.mllib.feature.HashingTF
    import org.apache.spark.mllib.regression.LabeledPoint

object MLlib {

  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName(s"Book example: Scala")
    val sc = new SparkContext(conf)

    // Load 2 types of emails from text files: spam and ham (non-spam).
    // Each line has text from one email.
    val spam = sc.textFile("/home/training/Spam.txt")
    val ham = sc.textFile("/home/training/Ham.txt")

    // Create a HashingTF instance to map email text to vectors of 100 features.
    val tf = new HashingTF(numFeatures = 100)
    // Each email is split into words,and each word is mapped to one feature.
    val spamFeatures = spam.map(email => tf.transform(email.split(" ")))
    val hamFeatures = ham.map(email => tf.transform(email.split(" ")))

    // Create LabeledPoint datasets for positive (spam) and negative (ham) examples.
    val positiveExamples = spamFeatures.map(features => LabeledPoint(1,features))
    val negativeExamples = hamFeatures.map(features => LabeledPoint(0,features))
    val trainingData = positiveExamples ++ negativeExamples
    trainingData.cache() // Cache data since Logistic Regression is an iterative algorithm.

    // Create a Logistic Regression learner which uses the LBFGS optimizer.
    val lrLearner = new LogisticRegressionWithSGD()
    // Run the actual learning algorithm on the training data.
    val model = lrLearner.run(trainingData)

    // Test on a positive example (spam) and a negative one (ham).
    // First apply the same HashingTF feature transformation used on the training data.
    val posTestExample = tf.transform("O M G GET cheap stuff by sending money to ...".split(" "))
    val negTestExample = tf.transform("Hi Dad,I started studying Spark the other ...".split(" "))
    // Now use the learned model to predict spam/ham for new emails.
    println(s"Prediction for positive test example: ${model.predict(posTestExample)}")
    println(s"Prediction for negative test example: ${model.predict(negTestExample)}")

    sc.stop()
  }
}

解决方法

有几件事:

您在Spark shell中定义了对象,因此不会立即调用主类.在定义对象后,您必须明确地调用它:

MLlib.main(阵列())

事实上,如果你继续使用shell / REPL,你可以完全取消对象;你可以直接定义这个功能.例如:

import org.apache.spark.{SparkConf,SparkContext}
import org.apache.spark.mllib.classification.LogisticRegressionWithSGD
import org.apache.spark.mllib.feature.HashingTF
import org.apache.spark.mllib.regression.LabeledPoint

def MLlib {
    //the rest of your code
}

但是,您不应该在shell中初始化SparkContext.从documentation

In the Spark shell,a special interpreter-aware SparkContext is
already created for you,in the variable called sc. Making your own
SparkContext will not work

因此,您必须从代码中删除该位,或者将其编译为jar并使用spark-submit运行它

相关文章

共收录Twitter的14款开源软件,第1页Twitter的Emoji表情 Tw...
Java和Scala中关于==的区别Java:==比较两个变量本身的值,即...
本篇内容主要讲解“Scala怎么使用”,感兴趣的朋友不妨来看看...
这篇文章主要介绍“Scala是一种什么语言”,在日常操作中,相...
这篇文章主要介绍“Scala Trait怎么使用”,在日常操作中,相...
这篇文章主要介绍“Scala类型检查与模式匹配怎么使用”,在日...