如何在 Batch、PyTorch 上填充零

问题描述

有没有更好的方法来做到这一点？如何用零填充张量，而不创建新的张量对象？我需要输入始终是相同的 batchsize，所以我想用零填充小于 batchsize 的输入。就像序列长度较短时在 NLP 中填充零一样，但这是批量填充。

目前，我创建了一个新的张量，但正因为如此，我的 GPU 将耗尽内存。我不想将批量减少一半来处理这个操作。

import torch
from torch import nn

class MyModel(nn.Module):
    def __init__(self,batchsize=16):
        super().__init__()
        self.batchsize = batchsize
    
    def forward(self,x):
        b,d = x.shape
        
        print(x.shape) # torch.Size([7,32])

        if b != self.batchsize: # 2. I need batches to be of size 16,if batch isn't 16,I want to pad the rest to zero
            new_x = torch.zeros(self.batchsize,d) # 3. so I create a new tensor,but this is bad as it increase the GPU memory required greatly
            new_x[0:b,:] = x
            x = new_x
            b = self.batchsize
        
        print(x.shape) # torch.Size([16,32])

        return x

model = MyModel()
x = torch.randn((7,32)) # 1. shape's batch is 7,because this is last batch,and I dont want to "drop_last"
y = model(x)
print(y.shape)

解决方法

你可以像这样填充额外的元素：

import torch.nn.functional as F

n = self.batchsize - b

new_x = F.pad(x,(0,n,0)) # pad the start of 2d tensors
new_x = F.pad(x,n)) # pad the end of 2d tensors
new_x = F.pad(x,n)) # pad the end of 3d tensors

arrays arrays python pytorch zero-padding