使用Python进行数据扩充

问题描述

我目前正在从事与CNN相关的项目，在那里我是该特定领域的新手。我喜欢一组包含500张织物缺陷图像的图像。如何增加最多2000张图片的数量？我可以在此使用任何库吗？

解决方法

有不同的数据增强技术，例如缩放，镜像，旋转，裁剪等。其想法是从您的初始图像集中创建新图像，以便模型必须考虑到这些变化所引起的新信息。

可以通过多种方式做到这一点，第一个是OpenCV，然后您可以在Tensorflow之上使用Keras，后者为数据生成或scikit图像提供了内置的高级功能。

我建议先从简单有效的技术开始，例如镜像和随机裁剪，然后继续进行颜色或对比度增强。

文档和文章：

用于图像增强的首选库是imgaug。

文档是自我解释的，但这是一个示例：


import numpy as np
from imgaug import augmenters as iaa
from PIL import Image

# load image and convert to matrix
image = np.array(Image.open("<path to image>"))

# convert image to matrix
# image must passed into a list because you can also put a list of multiple images into the augmenter,but for this demonstration we will only take one.
image = [image]

# all these augmentation techniques will applied with a certain probability
augmenter = iaa.Sequential([
    iaa.Fliplr(0.5),# horizontal flips
    iaa.Crop(percent=(0,0.1)),# random crops

    iaa.Sometimes(
        0.5,iaa.GaussianBlur(sigma=(0,0.5))
    ),iaa.AdditiveGaussianNoise(loc=0,scale=(0.0,0.05*255),per_channel=0.5),],random_order=True) # apply augmenters in random order

augmented_image = augmenter(images=image)

augmented_image现在是一个列表，其中包含原始图像的一个增强图像。因为您说过要从500张图像中创建2000张，所以您可以执行以下操作：您将每个图像放大4次，即：


total_images = []
for image_path in image_paths:
    image = Image.load(image_path)

    # create a list with for times the same image
    images = [image for i in range(4)]
    
    # pass it into the augmenter and get 4 different augmentations
    augmented_images = augmenter(images=images)
    
    # add all images to a list or save it otherwise
    total_images += augmented_images

cnn data-augmentation

使用Python进行数据扩充

问题描述

解决方法

相关问答