使用 MS-COCO 格式作为 PyTorch MASKRCNN 的输入

问题描述

我正在尝试使用 MS-COCO 格式的自定义数据集训练 MaskRCNN 图像分割模型。

我正在尝试使用多边形掩码作为输入，但无法使其适合我的模型的格式。

我的数据如下所示：

{"id": 145010,“image_id”：101953， “category_id”：1040，

“分割”：[[140.0，352.5，131.0，351.5，118.0，344.5，101.50000000000001，323.0，94.5，303.0，86.5，292.0，52.0，263.5，35.0，255.5，20.5，240.0，11.5，214.0，14.5 ，190.0，22.0，179.5，53.99999999999999，170.5，76.0，158.5，88.5，129.0，100.5，111.0，152.0，70.5，175.0，65.5，217.0，64.5，272.0，48.5，296.0，56.49999999999999，320.5，82.0，350.5，135.0 ，374.5，163.0，382.5，190.0，381.5，205.99999999999997，376.5，217.0，371.0，221.5，330.0，229.50000000000003，312.5，240.0，310.5，291.0，302.5，310.0，288.0，326.5，259.0，337.5，208.0，339.5，171.0,349.5]],

“区域”：73578.0，

"bBox": [11.5,11.5,341.0,371.0],

"iscrowd": 0}

我在这张图片中有一个对象，因此有一个用于分割和 bBox 的项目。分割值是多边形的像素，因此对于不同的对象有不同的大小。

有人能帮我解决这个问题吗？

解决方法

要管理 COCO 格式的数据集，您可以使用 this repo。它提供了可以从注释文件中实例化的类，使其非常易于使用和访问数据。

我不知道您使用的是哪种实现，但如果它类似于 this tutorial，这段代码至少可以为您提供一些有关如何解决问题的想法：

class CocoDataset(torch.utils.data.Dataset):
def __init__(self,dataset_dir,subset,transforms):
    dataset_path = os.path.join(dataset_dir,subset)
    ann_file = os.path.join(dataset_path,"annotation.json")
    self.imgs_dir = os.path.join(dataset_path,"images")
    self.coco = COCO(ann_file)
    self.img_ids = self.coco.getImgIds()
    
    self.transforms = transforms


def __getitem__(self,idx):
    '''
    Args:
        idx: index of sample to be fed
    return:
        dict containing:
        - PIL Image of shape (H,W)
        - target (dict) containing: 
            - boxes:    FloatTensor[N,4],N being the n° of instances and it's bounding 
            boxe coordinates in [x0,y0,x1,y1] format,ranging from 0 to W and 0 to H;
            - labels:   Int64Tensor[N],class label (0 is background);
            - image_id: Int64Tensor[1],unique id for each image;
            - area:     Tensor[N],area of bbox;
            - iscrowd:  UInt8Tensor[N],True or False;
            - masks:    UInt8Tensor[N,H,W],segmantation maps;
    '''
    img_id = self.img_ids[idx]
    img_obj = self.coco.loadImgs(img_id)[0]
    anns_obj = self.coco.loadAnns(self.coco.getAnnIds(img_id)) 

    img = Image.open(os.path.join(self.imgs_dir,img_obj['file_name']))

    # list comprhenssion is too slow,might be better changing it
    bboxes = [ann['bbox'] for ann in anns_obj]
    # bboxes = ? from [x,y,w,h] to [x0,y1]
    masks = [self.coco.annToMask(ann) for ann in anns_obj]
    areas = [ann['area'] for ann in anns_obj]

    boxes = torch.as_tensor(bboxes,dtype=torch.float32)
    labels = torch.ones(len(anns_obj),dtype=torch.int64)
    masks = torch.as_tensor(masks,dtype=torch.uint8)
    image_id = torch.tensor([idx])
    area = torch.as_tensor(areas)
    iscrowd = torch.zeros(len(anns_obj),dtype=torch.int64)


    target = {}
    target["boxes"] = boxes
    target["labels"] = labels
    target["masks"] = masks
    target["image_id"] = image_id
    target["area"] = area
    target["iscrowd"] = iscrowd

    if self.transforms is not None:
        img,target = self.transforms(img,target)
    return img,target


def __len__(self):
    return len(self.img_ids)

再说一次，这只是一个草稿，旨在提供提示。

image-segmentation mscoco object-detection pytorch torchvision