问题描述
我正在尝试规范化我的图像数据,为此,我需要找到train_loader的均值和标准差。
mean = 0.0
std = 0.0
nb_samples = 0.0
for data in train_loader:
images,landmarks = data["image"],data["landmarks"]
batch_samples = images.size(0)
images_data = images.view(batch_samples,images.size(1),-1)
mean += torch.Tensor.float(images_data).mean(2).sum(0)
std += torch.Tensor.float(images_data).std(2).sum(0)
###mean += images_data.mean(2).sum(0)
###std += images_data.std(2).sum(0)
nb_samples += batch_samples
mean /= nb_samples
std /= nb_samples
这里的均值和标准差是一个火炬。大小([600])
当我在数据加载器上尝试(几乎)相同的代码时,它按预期工作:
# code from https://discuss.pytorch.org/t/about-normalization-using-pre-trained-vgg16-networks/23560/6?u=mona_jalal
mean = 0.0
std = 0.0
nb_samples = 0.0
for data in DataLoader:
images,data["landmarks"]
batch_samples = images.size(0)
images_data = images.view(batch_samples,-1)
mean += images_data.mean(2).sum(0)
std += images_data.std(2).sum(0)
nb_samples += batch_samples
mean /= nb_samples
std /= nb_samples
我得到:
mean is: tensor([0.4192,0.4195,0.4195],dtype=torch.float64),std is: tensor([0.1182,0.1184,0.1186],dtype=torch.float64)
所以我的数据加载器是:
class MothLandmarksDataset(Dataset):
"""Face Landmarks dataset."""
def __init__(self,csv_file,root_dir,transform=None):
"""
Args:
csv_file (string): Path to the csv file with annotations.
root_dir (string): Directory with all the images.
transform (callable,optional): Optional transform to be applied
on a sample.
"""
self.landmarks_frame = pd.read_csv(csv_file)
self.root_dir = root_dir
self.transform = transform
def __len__(self):
return len(self.landmarks_frame)
def __getitem__(self,idx):
if torch.is_tensor(idx):
idx = idx.tolist()
img_name = os.path.join(self.root_dir,self.landmarks_frame.iloc[idx,0])
image = io.imread(img_name)
landmarks = self.landmarks_frame.iloc[idx,1:]
landmarks = np.array([landmarks])
landmarks = landmarks.astype('float').reshape(-1,2)
sample = {'image': image,'landmarks': landmarks}
if self.transform:
sample = self.transform(sample)
return sample
transformed_dataset = MothLandmarksDataset(csv_file='moth_gt.csv',root_dir='.',transform=transforms.Compose(
[
Rescale(256),RandomCrop(224),ToTensor()
]
)
)
DataLoader = DataLoader(transformed_dataset,batch_size=3,shuffle=True,num_workers=4)
train_loader是:
# Device configuration
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
seed = 42
np.random.seed(seed)
torch.manual_seed(seed)
# split the dataset into validation and test sets
len_valid_set = int(0.1*len(dataset))
len_train_set = len(dataset) - len_valid_set
print("The length of Train set is {}".format(len_train_set))
print("The length of Test set is {}".format(len_valid_set))
train_dataset,valid_dataset,= torch.utils.data.random_split(dataset,[len_train_set,len_valid_set])
# shuffle and batch the datasets
train_loader = torch.utils.data.DataLoader(train_dataset,batch_size=8,num_workers=4)
test_loader = torch.utils.data.DataLoader(valid_dataset,num_workers=4)
请让我知道是否需要更多信息。
我基本上需要获得3个值的train_loader平均值和3个std的train_loader平均值,以用作normalize的args。
循环内的数据加载器中的images_data是torch.Size([3,3,50176]),train_loader中的图像数据是torch.Size([8,600,2400])
解决方法
首先,您的平均值和std([600]
)的怪异形状令人惊讶,这是由于数据的形状为[8,600,800,3]
。基本上,渠道维度是这里的最后一个维度,因此,当您尝试使用
# (N,3) -> [view] -> (N,2400 = 800*3)
images_data = images.view(batch_samples,images.size(1),-1)
您实际上执行了一个怪异的操作,将图像的宽度和通道尺寸融合在一起,现在为[8,2400]
。因此,申请
# (8,2400) -> [mean(2)] -> (8,600) -> [sum(0)] -> (600)
data.mean(2).sum(0)
创建大小为[600]
的张量,这确实是您得到的。
有两个非常简单的解决方案: 您可以从permuting维度开始,以使第二维度成为渠道1:
batch_samples = images.size(0)
# (N,H,W,C) -> (N,C,W)
reordered = images.permute(0,3,1,2)
# flatten image into (N,H*W)
images_data = reordered.view(batch_samples,reordered.size(1),-1)
# mean is now (C) = (3)
mean += images_data.mean(2).sum(0)
或者您更改应用mean
和sum
的轴
batch_samples = images.size(0)
# flatten image into (N,H*W,C),careful this is not what you did
images_data = images.view(batch_samples,-1,images.size(1))
# mean is now (C) = (3)
mean += images_data.mean(1).sum(0)
最后,为什么dataloader
和trainloader
的行为有所不同?好吧,我认为这是因为一个使用dataset
而另一个使用transformedDataset
。在TransformedDataset
中,您应用了toTensor
变换,该变换将PIL图像投射到了火炬张量中,并且我认为,pytorch足够聪明,可以在此操作期间对尺寸进行置换(并且将渠道置于第二维度)。换句话说,您的两个数据集不会产生具有相同格式的图像,它们的坐标轴不同。