正则按照特定规则拆分聊天记录

将聊天记录拆成对应的每句话的role,time,message的格式

<?PHP
$data = '[S][04:39:35] [#1][V][04:39:35] 不忈: Hello[V][04:39:35] 不忈: Hello[V][04:39:35] 不忈: Hello[V][04:40:11] 不忈: 你好[A][04:40:38] agent01: 嗯[S][04:40:54] [#8]给你[S][04:40:57] agent03 [#6][S][04:41:00] agent01 [#7][A][04:41:02] agent03: hade [V][04:41:06] 不忈: 一[A][04:41:10] agent03: en [S][04:41:20] [#8]-[S][04:41:22] agent01 [#6][S][04:41:24] agent03 [#7][A][04:41:27] agent01: 32423[V][04:41:29] 不忈: 。[S][04:41:31] [#2]agent01';
preg_match_all('/(\[S\](.*?))(?=\[[SVA]\])|(\[S\](.*?))$|(\[V\](.*?))(?=\[[SVA]\])|(\[V\](.*?))$|(\[A\](.*?))(?=\[[SVA]\])|(\[A\](.*?))$/',$data,$matches);
$data1 = $matches[0]; 
foreach($matches[0] as $k => $v){
	preg_match_all('/\[\w\]|\[\d{2}:\d{2}:\d{2}\]/',$v,$vMatch);
	$res[$k]['role'] = $vMatch[0][0];
	$res[$k]['time'] = $vMatch[0][1];
	$res[$k]['message'] = preg_filter('/\[(S|A|V)\]|\[\d{2}:\d{2}:\d{2}\]/','',$v);
}
print_r($res);exit;

拆分出来后:

Array
(
    [0] => Array
        (
            [role] => [S]
            [time] => [04:39:35]
            [message] =>  [#1]
        )

    [1] => Array
        (
            [role] => [V]
            [time] => [04:39:35]
            [message] =>  不忈: Hello
        )

    [2] => Array
        (
            [role] => [V]
            [time] => [04:39:35]
            [message] =>  不忈: Hello
        )

    [3] => Array
        (
            [role] => [V]
            [time] => [04:39:35]
            [message] =>  不忈: Hello
        )

    [4] => Array
        (
            [role] => [V]
            [time] => [04:40:11]
            [message] =>  不忈: 你好
        )

    [5] => Array
        (
            [role] => [A]
            [time] => [04:40:38]
            [message] =>  agent01: 嗯
        )

    [6] => Array
        (
            [role] => [S]
            [time] => [04:40:54]
            [message] =>  [#8]给你
        )

    [7] => Array
        (
            [role] => [S]
            [time] => [04:40:57]
            [message] =>  agent03 [#6]
        )

    [8] => Array
        (
            [role] => [S]
            [time] => [04:41:00]
            [message] =>  agent01 [#7]
        )

    [9] => Array
        (
            [role] => [A]
            [time] => [04:41:02]
            [message] =>  agent03: hade 
        )

    [10] => Array
        (
            [role] => [V]
            [time] => [04:41:06]
            [message] =>  不忈: 一
        )

    [11] => Array
        (
            [role] => [A]
            [time] => [04:41:10]
            [message] =>  agent03: en 
        )

    [12] => Array
        (
            [role] => [S]
            [time] => [04:41:20]
            [message] =>  [#8]-
        )

    [13] => Array
        (
            [role] => [S]
            [time] => [04:41:22]
            [message] =>  agent01 [#6]
        )

    [14] => Array
        (
            [role] => [S]
            [time] => [04:41:24]
            [message] =>  agent03 [#7]
        )

    [15] => Array
        (
            [role] => [A]
            [time] => [04:41:27]
            [message] =>  agent01: 32423
        )

    [16] => Array
        (
            [role] => [V]
            [time] => [04:41:29]
            [message] =>  不忈: 。
        )

    [17] => Array
        (
            [role] => [S]
            [time] => [04:41:31]
            [message] =>  [#2]agent01
        )

)

相关文章

正则替换html代码中img标签的src值在开发富文本信息在移动端...
正则表达式
AWK是一种处理文本文件的语言,是一个强大的文件分析工具。它...
正则表达式是特殊的字符序列,利用事先定义好的特定字符以及...
Python界一名小学生,热心分享编程学习。
收集整理每周优质开发者内容,包括、、等方面。每周五定期发...