在值重叠的Scala映射中找到一组键

问题描述

我正在使用Scala中的地图对象,其中的键是购物篮ID,值是购物篮中包含的一组商品ID。目标是摄取此地图对象并为每个购物篮计算一组其他的购物篮ID,这些ID至少包含一个共同的商品。

说输入地图对象是

ans

是否可以在spark中执行计算,以使我获得相交的购物篮信息?例如 val basket = Map("b1" -> Set("i1","i2","i3"),"b2" -> Set("i2","i4"),"b3" -> Set("i3","i5"),"b4" -> Set("i6"))

谢谢!

解决方法

类似...

val basket = Map("b1" -> Set("i1","i2","i3"),"b2" -> Set("i2","i4"),"b3" -> Set("i3","i5"),"b4" -> Set("i6"))

def intersectKeys( set : Set[String],map : Map[String,Set[String]] ) : Set[String] = {
  val checks = map.map { case (k,v) =>
    if (set.intersect(v).nonEmpty) Some(k) else None
  }
  checks.collect { case Some(k) => k }.toSet
}

// each set picks up its own key,which we don't want,so we subtract it back out
val intersects = basket.map { case (k,v) => (k,intersectKeys(v,basket) - k) }