问题描述
我正在使用google colaboratory和tensorflow训练神经网络对猫和狗的图像进行分类。我在哪里使用model.fit_generator对我的数据进行训练。我的数据可以很好地加载,但是当它在一些遍历图像的时期开始通过验证步骤进行迭代时,出现标题中描述的以下错误:
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7f347160a0f8>
我正在使用的猫狗图像是从kaggle下载的图像
我已经看到了一些在jupyter笔记本上对单个图像使用PIL的解决方案,但是鉴于google collab隐式使用了PIL,我该如何处理google collab上每个图像的错误?
这是我的代码的实例
from google.colab import files
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Conv2D,MaxPooling2D
from keras.layers import Activation,Dropout,Flatten,Dense
from keras import backend as K
import numpy as np
from keras.preprocessing import image
from google.colab import drive
drive.mount('/content/drive')
img_width,img_height = 150,150
train_data_dir = '/content/drive/My Drive/data/train'
validation_data_dir = '/content/drive/My Drive/data/validation'
nb_train_samples = 1000
nb_validation_Samples = 100
epochs = 10
batch_size = 20
if K.image_data_format() == 'channels_first':
input_shape = (3,img_width,img_height)
else:
input_shape = (img_width,img_height,3)
train_datagen = ImageDataGenerator(
rescale= 1. / 255,shear_range = 0.2,zoom_range=0.2,horizontal_flip=True
)
test_datagen = ImageDataGenerator(rescale=1. / 255)
train_generator = train_datagen.flow_from_directory(
train_data_dir,target_size=(img_width,img_height),batch_size=batch_size,class_mode='binary')
model = Sequential()
model.add(Conv2D(32,(3,3),input_shape=input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.summary()
model.add(Conv2D(32,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64,2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.summary()
model.compile(loss='binary_crossentropy',optimizer='rmsprop',metrics=['accuracy'])
model.fit_generator(
train_generator,steps_per_epoch=nb_train_samples // batch_size,epochs=epochs,validation_data = validation_generator,validation_steps = nb_validation_Samples // batch_size)
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,class_mode="binary")
错误本身在此时发生:
model.fit_generator(
train_generator,validation_steps = nb_validation_Samples // batch_size)
具体地说,在这一行:
validation_steps = nb_validation_Samples // batch_size)
解决方法
如果数据集是 from microsoft 下载的,您可以使用下面的脚本来清理它。正如评论所示,该脚本主要是从另一个 SO 主题中采用的。
#!/usr/bin/env python
# https://stackoverflow.com/questions/63754311/unidentifiedimageerror-cannot-identify-image-file
# 1st in the answers
import os
from PIL import Image
folder_path = r'raw\PetImages'
extensions = []
for fldr in os.listdir(folder_path):
sub_folder_path = os.path.join(folder_path,fldr)
for filee in os.listdir(sub_folder_path):
file_path = os.path.join(sub_folder_path,filee)
print('** Path: {} **'.format(file_path),end="\r",flush=True)
try:
im = Image.open(file_path)
rgb_im = im.convert('RGB')
if filee.split('.')[1] not in extensions:
extensions.append(filee.split('.')[1])
except:
print("\nWrong format file: ",file_path,flush=True)
print("\nValid extensions: ",repr(extensions))
'''
** Path: raw\PetImages\Cat\666.jpg **
Wrong format file: raw\PetImages\Cat\666.jpg
** Path: raw\PetImages\Cat\Thumbs.db **
Wrong format file: raw\PetImages\Cat\Thumbs.db
** Path: raw\PetImages\Dog\11702.jpg **
Wrong format file: raw\PetImages\Dog\11702.jpg
** Path: raw\PetImages\Dog\9057.jpg **D:\penv38\lib\site-packages\PIL\TiffImagePlugin.py:811: UserWarning: Truncated File Read
warnings.warn(str(msg))
** Path: raw\PetImages\Dog\Thumbs.db **
Wrong format file: raw\PetImages\Dog\Thumbs.db
Valid extensions: ['jpg']
Thus exclude these files:
Cat\666.jpg
Dog\11702.jpg
Dog\9057.jpg
'''