使用多格式解析器Regex解析FluentD中的Nginx入口访问日志

问题描述

我在K8S集群中有一个Nginx入口控制器,其日志格式如下(我从容器中的/etc/nginx/nginx.conf那里获得了它):

log_format upstreaminfo '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_length $request_time [$proxy_upstream_name] [$proxy_alternative_upstream_name] $upstream_addr $upstream_response_length $upstream_response_time $upstream_status $req_id';

我的目标是解析Nginx日志并将其推送到CW。请注意,Nginx日志文件包含Nginx应用程序日志(例如信息和警告日志)以及访问日志。我的理解是我必须使用多格式解析器插件。因此,我按照以下方式配置了FluentD(请参见expression过滤器的@nginx):

    <source>
      @type tail
      @id in_tail_container_logs
      @label @containers
      path /var/log/containers/*.log
      exclude_path ["/var/log/containers/cloudwatch-agent*","/var/log/containers/fluentd*","/var/log/containers/nginx*"]
      pos_file /var/log/fluentd-containers.log.pos
      tag *
      read_from_head true
      <parse>
        @type json
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>

    <source>
      @type tail
      @id in_tail_nginx_container_logs
      @label @nginx
      path /var/log/containers/nginx*.log
      pos_file /var/log/fluentd-nginx.log.pos
      tag *
      read_from_head true
      <parse>
        @type json
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>

    <source>
      @type tail
      @id in_tail_cwagent_logs
      @label @cwagentlogs
      path /var/log/containers/cloudwatch-agent*
      pos_file /var/log/cloudwatch-agent.log.pos
      tag *
      read_from_head true
      <parse>
        @type json
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>

    <label @containers>
      <filter **>
        @type parser
        key_name log
        format json
        reserve_data true
      </filter>

      <filter **>
        @type kubernetes_metadata
        @id filter_kube_metadata
      </filter>

      <filter **>
        @type record_transformer
        @id filter_containers_stream_transformer
        <record>
          stream_name ${tag_parts[3]}
        </record>
      </filter>

      <filter **>
        @type concat
        key log
        multiline_start_regexp /^\S/
        separator ""
        flush_interval 5
        timeout_label @NORMAL
      </filter>

      <match **>
        @type relabel
        @label @NORMAL
      </match>
    </label>

    <label @nginx>
      <filter **>
        @type kubernetes_metadata
        @id filter_nginx_kube_metadata
      </filter>

      <filter **>
        @type record_transformer
        @id filter_nginx_containers_stream_transformer
        <record>
          stream_name ${tag_parts[3]}
        </record>
      </filter>

      <filter **>
        @type parser
        key_name log

        <parse>
          @type multi_format

          <pattern>
            format regexp
            expression /^(?<host>[^ ]*) (?<domain>[^ ]*) \[(?<x_forwarded_for>[^\]]*)\] (?<server_port>[^ ]*) - (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+[^\"])(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")? (?<request_length>[^ ]*) (?<request_time>[^ ]*) (?:\[(?<proxy_upstream_name>[^\]]*)\] )?(?:\[(?<proxy_alternative_upstream_name>[^\]]*)\] )?(?<upstream_addr>[^ ]*) (?<upstream_response_length>[^ ]*) (?<upstream_response_time>[^ ]*) (?<upstream_status>[^ ]*) (?<request_id>[^ ]*)\n$/
          </pattern>
        </parse>
      </filter>


      <match **>
        @type relabel
        @label @NORMAL
      </match>
    </label>

    <label @cwagentlogs>
      <filter **>
        @type kubernetes_metadata
        @id filter_kube_metadata_cwagent
      </filter>

      <filter **>
        @type record_transformer
        @id filter_cwagent_stream_transformer
        <record>
          stream_name ${tag_parts[3]}
        </record>
      </filter>

      <filter **>
        @type concat
        key log
        multiline_start_regexp /^\d{4}[-/]\d{1,2}[-/]\d{1,2}/
        separator ""
        flush_interval 5
        timeout_label @NORMAL
      </filter>

      <match **>
        @type relabel
        @label @NORMAL
      </match>
    </label>

    <label @NORMAL>
      <match **>
        @type cloudwatch_logs
        @id out_cloudwatch_logs_containers
        region "#{ENV.fetch('REGION')}"
        log_group_name "/aws/containerinsights/#{ENV.fetch('CLUSTER_NAME')}/application"
        log_stream_name_key stream_name
        remove_log_stream_name_key true
        auto_create_stream true
        <buffer>
          flush_interval 5
          chunk_limit_size 2m
          queued_chunks_limit_size 32
          retry_forever true
        </buffer>
      </match>
    </label>

现在我看到以下日志的解析器错误:

...#0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data '10.0.1.2 - - [25/Aug/2020:11:43:09 +0000] \"GET /favicon.ico HTTP/1.1\" 499 0 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:79.0) Gecko/ Firefox/79.0\" 901 0.000 [develop-api-8080] [] 10.0.2.3:8080 0 0.000 - 3a3d3bbd02a633aaaab2af3b5284a0c9\n'"
..."log"=>"10.0.1.2 - - [25/Aug/2020:11:43:09 +0000] \"GET /favicon.ico HTTP/1.1\" 499 0 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:79.0) Gecko/20100101 Firefox/79.0\" 901 0.000 [develop-api-8080] [] 10.0.2.3:8080 0 0.000 - 3a3d3bbd02a633aaaab2af3b5284a0c9\n"

我不确定问题是否出在我的正则表达式或配置的其他部分。 (请注意,我还没有为Nginx应用程序日志添加解析器!)。谢谢。

解决方法

这本身不是一个答案,因为我认为正则表达式不太正确。但是由于可以访问Ngnix,因此我只需将日志格式更改为JSON即可,而不必使用Regex进行解析:

Redux

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...