php – 使用MySQL检测垃圾邮件发送者

我看到越来越多的用户在我的网站上注册,只是向其他用户发送重复的垃圾邮件.我添加了一些服务器端代码来检测具有以下mysql查询的重复消息:

  SELECT count(content) as msgs_sent 
    FROM messages 
   WHERE sender_id = '.$sender_id.' 
GROUP BY content having count(content) > 10

查询运行良好,但现在他们通过更改其消息中的一些charctersr来解决这个问题.有没有办法用MysqL检测这个或者我是否需要查看从MysqL返回的每个分组,然后使用PHP来确定相似性的百分比?

有什么想法或建议吗?

解决方法:

全文匹配

您可以看一下类似于MATCH示例here的实现:

MysqL> SELECT id, body, MATCH (title,body) AGAINST
    -> ('Security implications of running MysqL as root') AS score
    -> FROM articles WHERE MATCH (title,body) AGAINST
    -> ('Security implications of running MysqL as root');
+----+-------------------------------------+-----------------+
| id | body                                | score           |
+----+-------------------------------------+-----------------+
|  4 | 1. Never run MysqLd as root. 2. ... | 1.5219271183014 |
|  6 | When configured properly, MysqL ... | 1.3114095926285 |
+----+-------------------------------------+-----------------+
2 rows in set (0.00 sec)

所以对于你的例子,也许:

SELECT id, MATCH (content) AGAINST ('your string') AS score
FROM messages 
WHERE MATCH (content) AGAINST ('your string')
    AND score > 1;

请注意,要使用这些函数,您的内容列必须是FULLTEXT索引.

这个例子中得分是多少?

这是一个相关价值.它通过下面描述的过程计算:

Every correct word in the collection and in the query is weighted
according to its significance in the collection or query.
Consequently, a word that is present in many documents has a lower
weight (and may even have a zero weight), because it has lower
semantic value in this particular collection. Conversely, if the word
is rare, it receives a higher weight. The weights of the words are
combined to compute the relevance of the row.

documentation页面.

相关文章

统一支付是JSAPI/NATIVE/APP各种支付场景下生成支付订单,返...
统一支付是JSAPI/NATIVE/APP各种支付场景下生成支付订单,返...
前言 之前做了微信登录,所以总结一下微信授权登录并获取用户...
FastAdmin是我第一个接触的后台管理系统框架。FastAdmin是一...
之前公司需要一个内部的通讯软件,就叫我做一个。通讯软件嘛...
统一支付是JSAPI/NATIVE/APP各种支付场景下生成支付订单,返...