问题描述
Apache spark具有可用的TF-IDF算法: https://spark.apache.org/docs/latest/ml-features.html#tf-idf
运行示例时,它将添加“ rawFeatures和” features”列,并输出以下数据框:
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| l | sentence | words | rawFeatures | features
|---|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0 | Hi... | ["hi",...] | [0,32,[1,12,16,22,28],1,1]] | [0,28,[0.69,0.69,0.29,0.29]] |
|---|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0 | I wish... | [...] | [0,[11,15,29,31],1]] | [0,1]] |
|---|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1 | Logistic... | [...] | [0,[3,4,27,30],1]] | [0,0.69]] |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
我有两个问题:
|-----------------------|
| word | label | TF-IDF |
|-----------------------|
基本上,我想要一个数据框,每个字包含多行,显示在其中的标签以及TF-IDF。
预先感谢:)
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)