如何编码平面 4:2:0 (fourcc P010)

问题描述

我正在尝试将fourcc V210(它是一种打包的YUV4:2:2 格式)重新编码为P010(平面YUV4:2:0)。我想我已经根据规范实现了它,但是渲染器给出了错误的图像,所以有些东西是关闭的。解码 V210 在 ffmpeg 中有一个不错的例子(定义是从他们的解决方案中修改的),但我找不到 P010 编码器来查看我做错了什么。

(是的,我试过 ffmpeg 并且它有效,但它太慢了,在 Intel Gen11 i7 上每帧需要大约 30 毫秒)

澄清(在@Frank 的问题之后):正在处理的帧为 4k(3840 像素宽),因此没有进行 128b 对齐的代码

这是在英特尔上运行的,所以应用了很少的字节序转换。

Try1 - 全绿色图片

以下代码

#define V210_READ_PACK_BLOCK(a,b,c) \
    do {                              \
        val  = *src++;                \
        a = val & 0x3FF;              \
        b = (val >> 10) & 0x3FF;      \
        c = (val >> 20) & 0x3FF;      \
    } while (0)

#define PIXELS_PER_PACK 6
#define BYTES_PER_PACK (4*4)

void MyClass::FormatVideoFrame(
    BYTE* inFrame,BYTE* outBuffer)
{
    const uint32_t pixels = m_height * m_width;

    const uint32_t* src = (const uint32_t *)inFrame);

    uint16_t* dstY = (uint16_t *)outBuffer;

    uint16_t* dstUvstart = (uint16_t*)(outBuffer + ((ptrdiff_t)pixels * sizeof(uint16_t)));
    uint16_t* dstUV = dstUvstart;

    const uint32_t packsPerLine = m_width / PIXELS_PER_PACK;

    for (uint32_t line = 0; line < m_height; line++)
    {
        for (uint32_t pack = 0; pack < packsPerLine; pack++)
        {
            uint32_t val;
            uint16_t u,y1,y2,v;

            if (pack % 2 == 0)
            {
                V210_READ_PACK_BLOCK(u,v);
                *dstUV++ = u;
                *dstY++ = y1;
                *dstUV++ = v;

                V210_READ_PACK_BLOCK(y1,u,y2);
                *dstY++ = y1;
                *dstUV++ = u;
                *dstY++ = y2;

                V210_READ_PACK_BLOCK(v,u);
                *dstUV++ = v;
                *dstY++ = y1;
                *dstUV++ = u;

                V210_READ_PACK_BLOCK(y1,v,y2);
                *dstY++ = y1;
                *dstUV++ = v;
                *dstY++ = y2;
            }
            else
            {
                V210_READ_PACK_BLOCK(u,v);
                *dstY++ = y1;

                V210_READ_PACK_BLOCK(y1,y2);
                *dstY++ = y1;
                *dstY++ = y2;

                V210_READ_PACK_BLOCK(v,u);
                *dstY++ = y1;

                V210_READ_PACK_BLOCK(y1,y2);
                *dstY++ = y1;
                *dstY++ = y2;
            }
        }
    }

#ifdef _DEBUG

    // Fully written Y space
    assert(dstY == dstUvstart);

    // Fully written UV space
    const BYTE* expectedVurrentUVPtr = outBuffer + (ptrdiff_t)GetoutFrameSize();
    assert(expectedVurrentUVPtr == (BYTE *)dstUV);

#endif
}

// This is called to determine outBuffer size
LONG MyClass::GetoutFrameSize() const
{
    const LONG pixels = m_height * m_width;

    return
        (pixels * sizeof(uint16_t)) +  // Every pixel 1 y
        (pixels / 2 / 2 * (2 * sizeof(uint16_t)));  // Every 2 pixels and every odd row 2 16-bit numbers
}

导致全绿色图像。结果证明这是一个丢失的位移位,按照 P010 规范将 10 位放在 16 位值的高位。

尝试 2 - Y 有效,UV 加倍?

更新了代码以正确地(或者我认为)将 YUV 值移动到其 16 位空间中的正确位置。

#define V210_READ_PACK_BLOCK(a,c) \
    do {                              \
        val  = *src++;                \
        a = val & 0x3FF;              \
        b = (val >> 10) & 0x3FF;      \
        c = (val >> 20) & 0x3FF;      \
    } while (0)


#define P010_WRITE_VALUE(d,v) (*d++ = (v << 6))

#define PIXELS_PER_PACK 6
#define BYTES_PER_PACK (4 * sizeof(uint32_t))

// Snipped constructor here which guarantees that we're processing
// something which does not violate alignment.

void MyClass::FormatVideoFrame(
    const BYTE* inBuffer,BYTE* outBuffer)
{   
    const uint32_t pixels = m_height * m_width;
    const uint32_t aligned_width = ((m_width + 47) / 48) * 48;
    const uint32_t stride = aligned_width * 8 / 3;

    uint16_t* dstY = (uint16_t *)outBuffer;

    uint16_t* dstUvstart = (uint16_t*)(outBuffer + ((ptrdiff_t)pixels * sizeof(uint16_t)));
    uint16_t* dstUV = dstUvstart;

    const uint32_t packsPerLine = m_width / PIXELS_PER_PACK;

    for (uint32_t line = 0; line < m_height; line++)
    {
        // Lines start at 128 byte alignment
        const uint32_t* src = (const uint32_t*)(inBuffer + (ptrdiff_t)(line * stride));

        for (uint32_t pack = 0; pack < packsPerLine; pack++)
        {
            uint32_t val;
            uint16_t u,v);
                P010_WRITE_VALUE(dstUV,u);
                P010_WRITE_VALUE(dstY,y1);
                P010_WRITE_VALUE(dstUV,v);

                V210_READ_PACK_BLOCK(y1,y2);
                P010_WRITE_VALUE(dstY,y2);

                V210_READ_PACK_BLOCK(v,u);
                P010_WRITE_VALUE(dstUV,v);
                P010_WRITE_VALUE(dstY,u);

                V210_READ_PACK_BLOCK(y1,y2);
            }
            else
            {
                V210_READ_PACK_BLOCK(u,y1);

                V210_READ_PACK_BLOCK(y1,y1);
                P010_WRITE_VALUE(dstY,y2);
            }
        }
    }

#ifdef _DEBUG

    // Fully written Y space
    assert(dstY == dstUvstart);

    // Fully written UV space
    const BYTE* expectedVurrentUVPtr = outBuffer + (ptrdiff_t)GetoutFrameSize();
    assert(expectedVurrentUVPtr == (BYTE *)dstUV);

#endif
}

这导致 Y 是正确的,U 和 V 的行数也是正确的,但不知何故 U 和 V 没有正确重叠。它有两个版本,似乎通过中心垂直镜像。将 V 归零的类似但不太明显的东西。所以这两个都以一半的宽度呈现?任何提示表示赞赏:)

修正: 发现错误,我不是按包而是按块翻转 VU

if (pack % 2 == 0)

应该

if (line % 2 == 0)

解决方法

有 2 个错误。第一个是因为我没有按照规范的要求将 10 位值推到更高的位。第二个是因为我不是按奇数行写 UV,而是按奇数包写 UV。

把它留在这里是为了迪斯科效果值,也许其他人需要玩这个并走同样的路。我了解到“只遵循规范”即使在完全未知的领域也能奏效 :) 感谢所有看过它的人。