Python KeyError:对于 flow_from_dataframe

问题描述

我有一个数据集,其中单独给出图像文件,并在单独的 csv 文件中给出该图像文件标签,第一列作为图像文件名,第二列作为其各自的标签。 我的代码如下。

import pandas as pd
train= pd.read_csv('/content/drive/MyDrive/Colab_Notebooks/label_train.csv',dtype=str)
train.head()

number;label
0   101.jpg;3
1   102.jpg;1
2   103.jpg;3
3   104.jpg;3
4   105.jpg;2

test = pd.read_csv('/content/drive/MyDrive/Colab_Notebooks/label_test.csv',dtype=str)
test.head()

number;label
0   201.jpg;3
1   202.jpg;3
2   203.jpg;1
3   204.jpg;3
4   205.jpg;3

train_folder = '/content/drive/MyDrive/Colab_Notebooks/bilder_train'
test_folder = '/content/drive/MyDrive/Colab_Notebooks/bilder_test'

import os
import numpy as np
import glob
import shutil
import matplotlib.pyplot as plt
import tensorflow as tf

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Activation,Conv2D,Flatten,Dropout,MaxPooling2D,Batchnormalization
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from keras import regularizers,optimizers

train_gen = ImageDataGenerator(rescale=1./255,rotation_range=45,width_shift_range=.15,height_shift_range=.15,horizontal_flip=True,zoom_range=0.5)
test_gen = ImageDataGenerator(rescale=1./255)

train_data = train_gen.flow_from_dataframe(dataframe = train,directory = train_folder,x_col = 'number',y_col = 'label',seed = 42,batch_size = 10,shuffle = True,class_mode='categorical',target_size = (100,100))

test_data = test_gen.flow_from_dataframe(dataframe = test,directory = test_folder,y_col = None,shuffle = False,100))

这是错误信息

KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self,key,method,tolerance)
   2897             try:
-> 2898                 return self._engine.get_loc(casted_key)
   2899             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'number'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
6 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self,tolerance)
   2898                 return self._engine.get_loc(casted_key)
   2899             except KeyError as err:
-> 2900                 raise KeyError(key) from err
   2901 
   2902         if tolerance is not None:

KeyError: 'number'

我完全不知道为什么会出现这个错误。有人知道这里发生了什么吗?

解决方法

您需要在 sep=; 函数的末尾添加 pd.read_csv(CSV 分隔符)。由于它的默认 sep 值为 ,,所以它会将 number;label 解释为单个列而不是 2 个单独的列

import pandas as pd
train= pd.read_csv('/content/drive/MyDrive/Colab_Notebooks/label_train.csv',dtype=str,sep=';')
train.head()

test = pd.read_csv('/content/drive/MyDrive/Colab_Notebooks/label_test.csv',sep=';')
test.head()