我在PyTorch中得到了600个值的张量,而不是train_loader的均值和标准差的3个值

问题描述

我正在尝试规范化我的图像数据,为此,我需要找到train_loader的均值和标准差。

mean = 0.0
std = 0.0
nb_samples = 0.0
for data in train_loader:
    images,landmarks = data["image"],data["landmarks"]
    batch_samples = images.size(0)
    images_data = images.view(batch_samples,images.size(1),-1)
    mean +=  torch.Tensor.float(images_data).mean(2).sum(0)
    std += torch.Tensor.float(images_data).std(2).sum(0)
    ###mean += images_data.mean(2).sum(0)
    ###std += images_data.std(2).sum(0)
    nb_samples += batch_samples

mean /= nb_samples
std /= nb_samples

这里的均值和标准差是一个火炬。大小([600])

当我在数据加载器上尝试(几乎)相同的代码时,它按预期工作:

# code from https://discuss.pytorch.org/t/about-normalization-using-pre-trained-vgg16-networks/23560/6?u=mona_jalal
mean = 0.0
std = 0.0
nb_samples = 0.0
for data in DataLoader:
    images,data["landmarks"]
    batch_samples = images.size(0)

    images_data = images.view(batch_samples,-1)
    mean += images_data.mean(2).sum(0)
    std += images_data.std(2).sum(0)
    nb_samples += batch_samples

mean /= nb_samples
std /= nb_samples

我得到: mean is: tensor([0.4192,0.4195,0.4195],dtype=torch.float64),std is: tensor([0.1182,0.1184,0.1186],dtype=torch.float64)

所以我的数据加载器是:

class MothLandmarksDataset(Dataset):
    """Face Landmarks dataset."""

    def __init__(self,csv_file,root_dir,transform=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable,optional): Optional transform to be applied
                on a sample.
        """
        self.landmarks_frame = pd.read_csv(csv_file)
        self.root_dir = root_dir
        self.transform = transform

    def __len__(self):
        return len(self.landmarks_frame)

    def __getitem__(self,idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()

        img_name = os.path.join(self.root_dir,self.landmarks_frame.iloc[idx,0])
        image = io.imread(img_name)
        landmarks = self.landmarks_frame.iloc[idx,1:]
        landmarks = np.array([landmarks])
        landmarks = landmarks.astype('float').reshape(-1,2)
        sample = {'image': image,'landmarks': landmarks}

        if self.transform:
            sample = self.transform(sample)

        return sample

transformed_dataset = MothLandmarksDataset(csv_file='moth_gt.csv',root_dir='.',transform=transforms.Compose(
                                               [
                                               Rescale(256),RandomCrop(224),ToTensor()      
                                               ]
                                                                        )
                                           )



DataLoader = DataLoader(transformed_dataset,batch_size=3,shuffle=True,num_workers=4)

train_loader是:

# Device configuration
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
seed = 42
np.random.seed(seed)
torch.manual_seed(seed)

# split the dataset into validation and test sets
len_valid_set = int(0.1*len(dataset))
len_train_set = len(dataset) - len_valid_set

print("The length of Train set is {}".format(len_train_set))
print("The length of Test set is {}".format(len_valid_set))

train_dataset,valid_dataset,= torch.utils.data.random_split(dataset,[len_train_set,len_valid_set])

# shuffle and batch the datasets
train_loader = torch.utils.data.DataLoader(train_dataset,batch_size=8,num_workers=4)
test_loader = torch.utils.data.DataLoader(valid_dataset,num_workers=4)

请让我知道是否需要更多信息。

我基本上需要获得3个值的train_loader平均值和3个std的train_loader平均值,以用作normalize的args。

循环内的数据加载器中的images_data是torch.Size([3,3,50176]),train_loader中的图像数据是torch.Size([8,600,2400])

enter image description here

解决方法

首先,您的平均值和std([600])的怪异形状令人惊讶,这是由于数据的形状为[8,600,800,3]。基本上,渠道维度是这里的最后一个维度,因此,当您尝试使用

# (N,3) -> [view] -> (N,2400 = 800*3)
images_data = images.view(batch_samples,images.size(1),-1)

您实际上执行了一个怪异的操作,将图像的宽度和通道尺寸融合在一起,现在为[8,2400]。因此,申请

# (8,2400) -> [mean(2)] -> (8,600) -> [sum(0)] -> (600) 
data.mean(2).sum(0)

创建大小为[600]的张量,这确实是您得到的。

有两个非常简单的解决方案: 您可以从permuting维度开始,以使第二维度成为渠道1:

batch_samples = images.size(0)
# (N,H,W,C) -> (N,C,W)
reordered = images.permute(0,3,1,2)
# flatten image into (N,H*W)
images_data = reordered.view(batch_samples,reordered.size(1),-1)
# mean is now (C) = (3)
mean += images_data.mean(2).sum(0)

或者您更改应用meansum的轴

 batch_samples = images.size(0)
# flatten image into (N,H*W,C),careful this is not what you did
images_data = images.view(batch_samples,-1,images.size(1))
# mean is now (C) = (3)
mean += images_data.mean(1).sum(0)

最后,为什么dataloadertrainloader的行为有所不同?好吧,我认为这是因为一个使用dataset而另一个使用transformedDataset。在TransformedDataset中,您应用了toTensor变换,该变换将PIL图像投射到了火炬张量中,并且我认为,pytorch足够聪明,可以在此操作期间对尺寸进行置换(并且将渠道置于第二维度)。换句话说,您的两个数据集不会产生具有相同格式的图像,它们的坐标轴不同。