VHDL RGB 到 YUV444 实现不匹配

问题描述

设计

我正在尝试在基于下一个近似值的硬件中实现 RGB 到 YUV444 的转换算法，我已经在基于 C 的程序中工作了：

#define CLIP(X) ( (X) > 255 ? 255 : (X) < 0 ? 0 : X)
#define RGB2Y(R,G,B) CLIP(( (  66 * (R) + 129 * (G) +  25 * (B) + 128) >> 8) +  16)
#define RGB2U(R,B) CLIP(( ( -38 * (R) -  74 * (G) + 112 * (B) + 128) >> 8) + 128)
#define RGB2V(R,B) CLIP(( ( 112 * (R) -  94 * (G) -  18 * (B) + 128) >> 8) + 128)

目前已完成模拟/验证

我使用自检方法模拟并验证了 VHDL 代码。我将 VHDL 输出与已知工作 C 算法使用多个图像生成的“黄金”参考 YUV444 值进行了比较，并且所有模拟都成功运行。

问题

当我在硬件中实现它并插入到视频管道中时，视频输出在同步方面看起来不错，帧速率、闪烁等没有明显问题，但问题是颜色没有对，有一个洋红色/紫色演员，例如黄色看起来是浅品红色，红色看起来更深一些......等等

我认为 UV 值很可能被剪裁/饱和并且 Y（亮度）工作正常，这就是我在生成的视频中看到的。

代码

请注意，为简单起见，我只发布了进行转换的部分以及相关的信号声明类型和函数。其余代码只是一个 axi 视频流信号包装器，它在视频同步方面运行良好，从这个意义上说不存在视频问题，如果您认为它会有所帮助，请告诉我，我也会发布它。

--函数：

  -- Absolute operation of: "op1 - op2 + op3" for the UV components
  function uv_op_abs(op1 : unsigned(15 downto 0); op2 : unsigned(15 downto 0); op3 : unsigned(15 downto 0))
  return unsigned is
    variable res1 : unsigned(15 downto 0);
  begin
    if op2 > op1 then
      res1 := (op2 - op1);
      if res1 > op3 then
        return res1 - op3;
      else 
        return op3 - res1;
      end if;
    else
      return (op1 - op2) + op3;
    end if;
  end uv_op_abs;

 function clip(mult_in : unsigned(15 downto 0))
  return unsigned is
  begin
    if to_integer(unsigned(mult_in)) > 240 then
      return unsigned(to_unsigned(240,8));
    else
      return unsigned(mult_in(7 downto 0));
    end if;
  end clip;

-- 信号/常量声明：

  --Constants
  constant coeff_0 : unsigned(7 downto 0) := "01000010";  --66
  constant coeff_1 : unsigned(7 downto 0) := "10000001";  --129  
  constant coeff_2 : unsigned(7 downto 0) := "00011001";  --25 
  constant coeff_3 : unsigned(7 downto 0) := "00100110";  --38
  constant coeff_4 : unsigned(7 downto 0) := "01001010";  --74
  constant coeff_5 : unsigned(7 downto 0) := "01110000";  --112
  constant coeff_6 : unsigned(7 downto 0) := "01011110";  --94
  constant coeff_7 : unsigned(7 downto 0) := "00010010";  --18
  constant coeff_8 : unsigned(7 downto 0) := "10000000";  --128
  constant coeff_9 : unsigned(7 downto 0) := "00010000";  --16

  --Pipeline registers
  signal red_reg : unsigned(7 downto 0);
  signal green_reg : unsigned(7 downto 0);
  signal blue_reg : unsigned(7 downto 0);

  signal y_red_reg_op1 : unsigned(15 downto 0);
  signal y_green_reg_op1 : unsigned(15 downto 0);
  signal y_blue_reg_op1 : unsigned(15 downto 0);

  signal u_red_reg_op1 : unsigned(15 downto 0);
  signal u_green_reg_op1 : unsigned(15 downto 0);
  signal u_blue_reg_op1 : unsigned(15 downto 0);

  signal v_red_reg_op1 : unsigned(15 downto 0);
  signal v_green_reg_op1 : unsigned(15 downto 0);
  signal v_blue_reg_op1 : unsigned(15 downto 0);

  signal y_reg_op2 : unsigned(15 downto 0);
  signal u_reg_op2 : unsigned(15 downto 0);
  signal v_reg_op2 : unsigned(15 downto 0);

  signal y_reg_op3 : unsigned(7 downto 0);
  signal u_reg_op3 : unsigned(7 downto 0);
  signal v_reg_op3 : unsigned(7 downto 0);

-- YUV444转换过程：

  RGB_YUV_PROC : process(clk)
  begin
    if rising_edge(clk) then
      if rst = '1' then
        red_reg <= (others => '0');
        green_reg <= (others => '0');
        blue_reg <= (others => '0');
        y_red_reg_op1 <= (others => '0');
        y_green_reg_op1 <= (others => '0');
        y_blue_reg_op1 <= (others => '0');
        u_red_reg_op1 <= (others => '0');
        u_green_reg_op1 <= (others => '0');
        u_blue_reg_op1 <= (others => '0');
        v_red_reg_op1 <= (others => '0');
        v_green_reg_op1 <= (others => '0');
        v_blue_reg_op1 <= (others => '0');
        y_reg_op2 <= (others => '0');
        u_reg_op2 <= (others => '0');
        v_reg_op2 <= (others => '0');
        y_reg_op3 <= (others => '0');
        u_reg_op3 <= (others => '0');
        v_reg_op3 <= (others => '0');
        yuv444_out <= (others => '0');
        soff_sync <= '0';
      else

        --Sync with first video frame with the tuser (sof) input signal
        if rgb_sof_in = '1' then
          soff_sync <= '1';
        end if;

        --Fetch a pixel
        if (rgb_sof_in = '1' or soff_sync = '1') and rgb_valid_in = '1' and yuv444_ready_out = '1' and bypass = '0' then
          green_reg <= unsigned(rgb_in(7 downto 0));
          blue_reg <= unsigned(rgb_in(15 downto 8));
          red_reg <= unsigned(rgb_in(23 downto 16));
        end if;

        -- RGB to YUV conversion
        -- Y--> CLIP(( (  66 * (R) + 129 * (G) +  25 * (B) + 128) >> 8) +  16)
        -- U--> CLIP(( ( -38 * (R) -  74 * (G) + 112 * (B) + 128) >> 8) + 128)
        -- V--> CLIP(( ( 112 * (R) -  94 * (G) -  18 * (B) + 128) >> 8) + 128)
        if (rgb_sof_in = '1' or soff_sync = '1') and (valid_delay = '1' or validff1 = '1') and yuv444_ready_out = '1' and bypass = '0' then
          --Y calc (  66 * (R) + 129 * (G) +  25 * (B) + 128) >> 8) +  16)
          y_red_reg_op1 <= coeff_0 * red_reg;
          y_green_reg_op1 <= coeff_1 * green_reg;
          y_blue_reg_op1 <= coeff_2 * blue_reg; 
          y_reg_op2 <=  y_red_reg_op1 + y_green_reg_op1 + y_blue_reg_op1 + (X"00" & coeff_8);
          y_reg_op3 <= (y_reg_op2(15 downto 8) + coeff_9);

          --U calc ( -38 * (R) -  74 * (G) + 112 * (B) + 128) >> 8) + 128)
          u_red_reg_op1 <= coeff_3 * red_reg;
          u_green_reg_op1 <= coeff_4 * green_reg;
          u_blue_reg_op1 <= coeff_5 * blue_reg;
          u_reg_op2 <= uv_op_abs(u_blue_reg_op1,(u_red_reg_op1 + u_green_reg_op1),(X"00" & coeff_8));
          u_reg_op3 <= (u_reg_op2(15 downto 8) + coeff_8);

          --V calc ( 112 * (R) -  94 * (G) -  18 * (B) + 128) >> 8) + 128)
          v_red_reg_op1 <= coeff_5 * red_reg;
          v_green_reg_op1 <= coeff_6 * green_reg;
          v_blue_reg_op1 <= coeff_7 * blue_reg;
          v_reg_op2 <= uv_op_abs(v_red_reg_op1,(v_blue_reg_op1 + v_green_reg_op1),(X"00" & coeff_8));
          v_reg_op3 <= (v_reg_op2(15 downto 8) + coeff_8);

          --Output data
          yuv444_out <= std_logic_vector(v_reg_op3) & std_logic_vector(u_reg_op3) & std_logic_vector(y_reg_op3);
        elsif yuv444_ready_out = '1' and rgb_valid_in = '1' and bypass = '1' then
          yuv444_out <= rgb_in;
        end if;

      end if;
    end if;
  end process; -- RGB_YUV_PROC

我也试过添加'clip;控制溢出的函数认为它会在“裁剪”的情况下对综合工具有所帮助，但没有帮助，问题仍然存在：

if (rgb_sof_in = '1' or soff_sync = '1') and (valid_delay = '1' or validff1 = '1') and yuv444_ready_out = '1' and bypass = '0' then
  --Y calc (  66 * (R) + 129 * (G) +  25 * (B) + 128) >> 8) +  16)
  y_red_reg_op1 <= coeff_0 * red_reg;
  y_green_reg_op1 <= coeff_1 * green_reg;
  y_blue_reg_op1 <= coeff_2 * blue_reg; 
  y_reg_op2 <=  y_red_reg_op1 + y_green_reg_op1 + y_blue_reg_op1 + (X"00" & coeff_8);
  y_reg_op3 <= clip( X"00" & (y_reg_op2(15 downto 8) + coeff_9));

  --U calc ( -38 * (R) -  74 * (G) + 112 * (B) + 128) >> 8) + 128)
  u_red_reg_op1 <= coeff_3 * red_reg;
  u_green_reg_op1 <= coeff_4 * green_reg;
  u_blue_reg_op1 <= coeff_5 * blue_reg;
  u_reg_op2 <= uv_op_abs(u_blue_reg_op1,(X"00" & coeff_8));
  u_reg_op3 <= clip( X"00" & (u_reg_op2(15 downto 8) + coeff_8));

  --V calc ( 112 * (R) -  94 * (G) -  18 * (B) + 128) >> 8) + 128)
  v_red_reg_op1 <= coeff_5 * red_reg;
  v_green_reg_op1 <= coeff_6 * green_reg;
  v_blue_reg_op1 <= coeff_7 * blue_reg;
  v_reg_op2 <= uv_op_abs(v_red_reg_op1,(X"00" & coeff_8));
  v_reg_op3 <= clip( X"00"& (v_reg_op2(15 downto 8) + coeff_8));

问题

我知道在硬件设计中，成功的模拟并不一定意味着设计在综合后会以相同的方式工作，而且我确信代码中有很多地方可以改进。这种方法有一些根本性的错误，但到目前为止我看不到，有没有人知道什么可能是错误的，为什么？

解决方法

首先要做的是：检查您使用的是 YUV 还是 YCbCr。那些经常被混淆而且不一样！！！不要混合它们。

然后我看到：

float

和

#define CLIP(X) ( (X) > 255 ? 255 : (X) < 0 ? 0 : X)

那些是非常不同的功能。第一个使用带符号的数据类型和 255 和 0 之间的剪辑，第二个只剪辑 240 正数，由于减法可能导致的算术溢出将无法正确处理。出于某种原因，您在整个代码中都使用了 function clip(mult_in : unsigned(15 downto 0)) return unsigned is begin if to_integer(unsigned(mult_in)) > 240 then return unsigned(to_unsigned(240,8)); else return unsigned(mult_in(7 downto 0)); end if; end clip; 算术！（为什么？unsigned 有什么问题？）

所以你已经在比较苹果和橙子了。

接下来你好像用的是绝对函数？！为什么？这根本不是原始代码的一部分。那当然会产生工件。你不能只是在负值上翻转符号并期望它们是正确的？

另外，请使用正确的命名。不应将常量值 16 命名为 signed。使代码难以阅读和维护。如果您想要灵活性，请完全使用不同的结构。像 coeff_9 这样的名称不会告诉您任何信息：当然它可能是一个系数，但它的用途是什么等等。

实际上，你可以只写（注意，我已经假设有符号算术）

coeff_X

或者，因为 y_red_reg_op1 <= red_reg * to_signed(66,8); 已经是 8，即使

red_reg'length

更容易阅读

然后代码可以变成类似（再次假设您将使用 y_red_reg_op1 <= red_reg * 66;）

signed

和--U calc (( -38 * (R) - 74 * (G) + 112 * (B) + 128) >> 8) + 128) u_red_reg_op1 <= -38 * red_reg; u_green_reg_op1 <= 74 * green_reg; u_blue_reg_op1 <= 112 * blue_reg; u_reg_op2 <= u_red_reg_op1 - u_green_reg_op1 + u_blue_reg_op1 + 128; u_reg_op3 <= clip(shift_right(u_reg_op2,8) + 128);当然应该是

clip

附言我希望您使用的是 function clip(value : signed(15 downto 0)) return signed is begin if value > 255 then return to_signed(255,8); elsif value < 0 then return to_signed(0,8); else return value; end if; end clip;。

如果所有这些仍然产生伪影，请检查您是否没有混淆 RGB 或 YCbCr 信号分量顺序。这是一个常见的错误。

最后的ps VHDL实际上有一个定点库，带有饱和逻辑。并且得到了大型FPGA制造商的支持。您甚至可以考虑使用它来编写一个比 C 解决方案“更好”的解决方案。

编辑：我刚刚阅读了 wikipedia 并且整个算法根本不需要剪辑或 abs 或任何这些。

从 8 位 RGB 到 16 位值的基本转换（Y'：无符号，U/V：有符号，矩阵值四舍五入，以便每个 [0..255] 的后续所需 Y'UV 范围达到而不会发生溢出）：
按比例缩小 (">>8") 到 8 位值并舍入 ("+128")（Y'：无符号，U/V：有符号）：
向值添加偏移量以消除任何负值（所有结果均为 8 位无符号）：

你应该同样实现你的算法

可能会变成这样

numeric_std

（检查！）

最后一点很重要：您需要更好的测试平台，用您的实现输出来确认“黄金”源的结果。

vhdl video-processing yuv