问题描述
我花了很多时间尝试调试一些pytorch代码,这些代码已经创建了一个最小的示例,目的是帮助更好地了解问题可能是什么。
我已经删除了与问题无关的所有必要代码部分,因此从功能的角度来看,剩下的代码段没有多大意义,但仍然显示我所面临的错误。
我正在处理的总体任务是一个循环,循环的每一遍都在计算图像的嵌入并将其添加到存储它的变量中。它有效地聚合了它(不进行串联,因此大小保持不变)。我不希望迭代次数会导致数据类型溢出,我也不会在这里或在我的代码中看到这种情况。
我的环境如下:
- python 3.6.2
- Pytorch 1.4.0
- Cudatoolkit 10.0
- Driver version 410.78
- GPU: Nvidia GeForce GT 1030 (2GB VRAM)
(though I've replicated this experiment with the same result on a Titan RTX with 24GB,same pytorch version and cuda toolkit and driver,it only goes out of memory further in the loop).
完整的代码如下。我已将2行标记为“罪魁祸首”,删除它们可以消除问题,尽管显然我需要找到一种在没有内存问题的情况下执行它们的方法。任何帮助将非常感激!您可以尝试使用任何名为“ source_image.bmp”的图像来复制问题。
import torch
from PIL import Image
import torchvision
from torchvision import transforms
from pynvml import nvmlDeviceGetHandleByIndex,nvmlDeviceGetMemoryInfo,nvmlInit
import sys
import os
os.environ["CUDA_VISIBLE_DEVICES"]='0' # this is necessary on my system to allow the environment to recognize my nvidia GPU for some reason
os.environ['CUDA_LAUNCH_BLOCKING'] = '1' # to debug by having all CUDA functions executed in place
torch.set_default_tensor_type('torch.cuda.FloatTensor')
# Preprocess image
tfms = transforms.Compose([
transforms.Resize(256),transforms.CenterCrop(224),transforms.ToTensor(),transforms.normalize([0.485,0.456,0.406],[0.229,0.224,0.225]),])
img = tfms(Image.open('source_image.bmp')).unsqueeze(0).cuda()
model = torchvision.models.resnet50(pretrained=True).cuda()
model.eval() # we put the model in evaluation mode,to prevent storage of gradient which might accumulate
nvmlInit()
h = nvmlDeviceGetHandleByIndex(0)
info = nvmlDeviceGetMemoryInfo(h)
print(f'Total available memory : {info.total / 1000000000}')
feature_extractor = torch.nn.Sequential(*list(model.children())[:-1])
orig_embedding = feature_extractor(img)
embedding_depth = 2048
mem0 = 0
embedding = torch.zeros(2048,img.shape[2],img.shape[3]) #,dtype=torch.float)
patch_size=[4,4]
patch_stride=[2,2]
patch_value=0.0
# Here,we iterate over the patch placement,defined at the top left location
for row in range(img.shape[2]-1):
for col in range(img.shape[3]-1):
print("######################################################")
######################################################
# Isolated line,culprit 1 of the GPU memory leak
######################################################
patched_embedding = feature_extractor(img)
delta_embedding = (patched_embedding - orig_embedding).view(-1,1,1)
######################################################
# Isolated line,culprit 2 of the GPU memory leak
######################################################
embedding[:,row:row+1,col:col+1] = torch.add(embedding[:,col:col+1],delta_embedding)
print("img size:\t\t",img.element_size() * img.nelement())
print("patched_embedding size:\t",patched_embedding.element_size() * patched_embedding.nelement())
print("delta_embedding size:\t",delta_embedding.element_size() * delta_embedding.nelement())
print("Embedding size:\t\t",embedding.element_size() * embedding.nelement())
del patched_embedding,delta_embedding
torch.cuda.empty_cache()
info = nvmlDeviceGetMemoryInfo(h)
print("\nMem usage increase:\t",info.used / 1000000000 - mem0)
mem0 = info.used / 1000000000
print(f'Free:\t\t\t {(info.total - info.used) / 1000000000}')
print("Done.")
解决方法
在加载模型后立即将其添加到代码中
for param in model.parameters():
param.requires_grad = False
来自https://pytorch.org/docs/stable/notes/autograd.html#excluding-subgraphs-from-backward