运行分段时出现 Tensorflow 错误

问题描述

我正在使用 Jetson Xavier NX 来运行由 [Segmentation][1] 创建的分段。 这些是我正在使用的库的版本 张量流 - 1.15.4 keras - 2.1.5 蟒蛇 - 3.6.9

但是,当我运行我的程序时,出现以下错误

2021-06-14 20:30:53.671609: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES Failed at random_op.cc:76 : Resource exhausted: OOM when allocating tensor with shape[3,3,128,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

这是我的代码

#!/usr/bin/env python3
# coding: utf-8


import mrcnn
#print(mrcnn)
import os
import sys
import random
import math
import numpy as np
import skimage.io
import matplotlib
import matplotlib.pyplot as plt

# Root directory of the project
ROOT_DIR = os.path.abspath("../")
print(ROOT_DIR)


# Import Mask RCNN
sys.path.append(ROOT_DIR)  # To find local version of the library
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize
# Import COCO config
sys.path.append(os.path.join(ROOT_DIR,"/Mask_RCNN/samples/coco/"))  # To find local version
import coco

#get_ipython().run_line_magic('matplotlib','inline')

# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR,"logs")

# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR,"mask_rcnn_coco.h5")
# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)

# Directory of images to run detection on
IMAGE_DIR = os.path.join(ROOT_DIR,"images")

class InferenceConfig(coco.CocoConfig):
    # Set batch size to 1 since we'll be running inference on
    # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1

config = InferenceConfig()
config.display()

# Create model object in inference mode.
model = modellib.MaskRCNN(mode="inference",model_dir=MODEL_DIR,config=config)

# Load weights trained on MS-COCO
model.load_weights(COCO_MODEL_PATH,by_name=True)

# COCO Class names
# Index of the class in the list is its ID. For example,to get ID of
# the teddy bear class,use: class_names.index('teddy bear')
class_names = ['BG','person','bicycle','car','motorcycle','airplane','bus','train','truck','boat','traffic light','fire hydrant','stop sign','parking meter','bench','bird','cat','dog','horse','sheep','cow','elephant','bear','zebra','giraffe','backpack','umbrella','handbag','tie','suitcase','frisbee','skis','sNowboard','sports ball','kite','baseball bat','baseball glove','skateboard','surfboard','tennis racket','bottle','wine glass','cup','fork','knife','spoon','bowl','banana','apple','sandwich','orange','broccoli','carrot','hot dog','pizza','donut','cake','chair','couch','potted plant','bed','dining table','toilet','tv','laptop','mouse','remote','keyboard','cell phone','microwave','oven','toaster','sink','refrigerator','book','clock','vase','scissors','teddy bear','hair drier','toothbrush']

import cv2
# Load a random image from the images folder
file_names = next(os.walk(IMAGE_DIR))[2]
image = skimage.io.imread("/sample_images/sample3.jpg")

# Run detection
results = model.detect([image],verbose=1)

# Visualize results
r = results[0]
visualize.display_instances(image,r['rois'],r['masks'],r['class_ids'],class_names,r['scores'])
cv2.imwrite("hi.jpg",image)

我在 aws ec2 上运行了相同的程序。唯一的区别是那里的 tensorflow 版本(我使用了 1.8.0 gpu)并且运行良好。是不是tensorflow版本导致的错误

编辑 我已将此添加代码的开头,如某些 github 问题所示

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

我仍然收到警告并且没有退出

 Stats: 
Limit:                  2107527168
InUse:                   436697856
MaxInUse:                683784192
NumAllocs:                    1722
MaxAllocSize:            170917888

2021-06-15 12:58:33.338214: W tensorflow/core/common_runtime/bfc_allocator.cc:427] ***********************xx**_**************__________________________________________________________
2021-06-15 12:58:33.680104: W tensorflow/core/common_runtime/bfc_allocator.cc:305] Garbage collection: deallocate free memory regions (i.e.,allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently,you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature.

我如何检查我的 gpu 是否正确分配? [1]:https://github.com/matterport/Mask_RCNN

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)