是否有现成的无监督的基于多字符串的模式发现库/软件?

问题描述

strace一个跟踪系统调用和信号的命令。其输出示例:

poll([{fd=11,events=POLLIN},{fd=12,{fd=30,{fd=31,{fd=104,events=POLLIN}],5,11) = 0 (Timeout)
recvmsg(30,{msg_namelen=0},0)         = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7f946e0c56e8,FUTEX_WAKE_PRIVATE,2147483647) = 1
futex(0x7f946e0c5698,1) = 1
recvmsg(30,0)         = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(31,0)         = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=11,0) = 0 (Timeout)
recvmsg(30,0)         = -1 EAGAIN (Resource temporarily unavailable)

它是高度模式化的 - 是否有现成的软件可以读取上面的输入,然后无监督地识别模式,例如:

====================================================
Patterns
====================================================

Pattern 1 (P1): {fd=$1,events=$2}

P2: P1
    where $2=POLLIN

P3: [P2a,P2b,P2c,P2d,P2e]
    where a.$1=$1,b.$1=$2,c.$1=$3,d.$1=$4,e.$1=$5

P4: poll(P3,$6,$7) = 0 (Timeout)
    where P3.$1=$1 P3.$2=$2 P3.$3=$3 P3.$4=$4 P3.$5=$5

P6: recvmsg($1,0)         = -1 EAGAIN (Resource temporarily unavailable)

P7: futex($1,$2) = 1

P8:
    P4a
    P6b
    P7c
    P7d
    P6e
    P6f
where a.$1=$1 a.$2=$2 ...


====================================================
Output
====================================================

P8 where $1=11,$2=12,...
P8 where $1=11,...

是否存在已经实现的即用型无监督宇宙?

解决方法

压缩算法就是很好的例子。参见deflate、gzip、xz和lzma的理论和实现。