问题描述
大家好,Stackoverflow,
我想了解使用 Pearson 的查询。
nom
和 denom
可以是什么?
什么是r1: r1
和r2: r2
?
而且我不明白什么是 r.r1.rating
和 r.r2.rating
。
MATCH (u1:User {id: 3})-[r:RATED]->(m:Movie)
WITH u1,avg(r.rating) AS u1_mean
MATCH (u1)-[r1:RATED]->(m:Movie)<-[r2:RATED]-(u2)
WITH u1,u1_mean,u2,COLLECT({r1: r1,r2: r2}) AS ratings WHERE size(ratings) > 10
MATCH (u2)-[r:RATED]->(m:Movie)
WITH u1,avg(r.rating) AS u2_mean,ratings
UNWIND ratings AS r
WITH sum( (r.r1.rating-u1_mean) * (r.r2.rating-u2_mean) ) AS nom,sqrt( sum( (r.r1.rating - u1_mean)^2) * sum( (r.r2.rating - u2_mean) ^2)) AS denom,u1,u2 WHERE denom <> 0
WITH u1,nom/denom AS pearson
ORDER BY pearson DESC LIMIT 10
MATCH (u2)-[r:RATED]->(m:Movie) WHERE NOT EXISTS( (u1)-[:RATED]->(m) )
RETURN m.name,SUM( pearson * r.rating) AS score
ORDER BY score DESC LIMIT 25
输出如下:
"m.name" │"score" │
│《西雅图夜未眠》│25.859451877376813│
│《隧道》│22.652532472101605│
│《甲壳虫汁》│22.21835919736008 │
│“如果你知道什么就尖叫..”│21.935357890253528│
│《亡灵黎明》│21.421377433824798│
│《禅达的囚徒》│21.225502683325033│
│《天才雷普利先生》│20.83938743140176 │
任何建议都会有所帮助。
解决方法
所以这里描述了 Pearson 的公式:https://en.wikipedia.org/wiki/Pearson_correlation_coefficient#For_a_sample
nom 只是该公式的分子,定义如下: “与总和((r.r1.rating-u1_mean)*(r.r2.rating-u2_mean))AS nom,”
同样,denom 是分母。
我对另外两个问题不太清楚,但希望这会有所帮助!