Ruby:拦截popen系统调用并将stdout和stderr记录到同一文件

问题描述

在红宝石代码中,我正在使用Open3.popen3运行系统调用,并将生成的IO用于stdout和stderr进行一些日志消息格式化,然后再写入一个日志文件。我想知道什么是最好的方法,这样日志消息才能保持正确的顺序,请注意,我需要对错误消息和标准输出消息分别进行格式化。

这是我当前的代码(假设记录器是线程安全的)

Open3.popen3("my_custom_script with_some_args") do |_in,stdout,stderr|
  stdout_thr = Thread.new do
    while line = stdout.gets.chomp
      logger.info(format(:info,line))
    end
  end
  stderr_thr = Thread.new do
    while line = stderr.gets.chomp
      logger.error(format(:error,line))
    end
  end
  [stdout_thr,stderr_thr].each(&:join)
end

到目前为止,这对我一直有效,但是我不确定我能否保证日志消息的正确顺序。有更好的方法吗?

解决方法

无法保证您要实现的目标。首先要注意的是,您的代码只能根据接收数据的时间排序,而不是根据数据生成的时间排序,这并不完全相同。保证这一点的唯一方法是在源代码上执行一些操作,这将在两个系统之间添加一定的保证顺序。

以下代码应通过删除线程使其“更可能”正确。假设您正在使用MRI,则线程为“绿色”,因此从技术上讲不能同时运行。这意味着您会看到调度程序选择在“正确的”时间运行线程。

Open3.popen3("my_custom_script with_some_args") do |_in,stdout,stderr|
  for_reading = [stdout,stderr]
  until(for_reading.empty?) do
    wait_timeout = 1
    # IO.select blocks until one of the streams is has something to read
    # or the wait timeout is reached
    readable,_writable,errors = IO.select(for_reading,[],wait_timeout)

    # readable is nil in the case of a timeout - loop back again
    if readable.nil?
      Thread.pass
    else
      # In the case that both streams are readable (and thus have content)
      # read from each of them. In this case,we cannot guarantee any order
      # because we recieve the items at essentially the same time.
      # We can still ensure that we don't mix data incorrectly.
      readable.each do |stream|
        buffer = ''
        # loop through reading data until there is an EOF (value is nil)
        # or there is no more data to read (value is empty)
        while(true) do
          tmp = stream.read_nonblock(4096,buffer,exception: false)
          if tmp.nil?
            # stream is EOF - nothing more to read on that one..
            for_reading -= [stream]
            break
          elsif tmp.empty? || tmp == :wait_readable
            # nothing more to read right now...
            # continue on to process the buffer into lines and log them
            break
          end
        end

        if stream == stdout
          buffer.split("\n").each { |line| logger.info(format(:info,line)) }
        elsif stream == stderr
          buffer.split("\n").each { |line| logger.info(format(:error,line)) }
        end
      end
    end
  end
end

请注意,在一个非常短的时间内产生大量输出的系统中,发生故障的重叠可能性更大。这种可能性随着读取流和处理流所花费的时间而增加。最好确保在循环内部完成绝对最小的处理。如果格式化(和写入)很昂贵,请考虑将这些项目移到从单个队列读取的单独线程中,并使循环内的代码仅将缓冲区(和源标识符)推送到队列中。