问题描述
我有一个非常基本的pdf文件,其中仅包含包含列和数据的电子表格。代码可以正常工作,直到电子表格同时包含数字和字符串为止。
下面的错误示例:
Game Unnamed:0 rating Players
Final Fantasy VII nan Teen 1
Ganbare Goemon nan Everyone 2
nan 13 Mature 1
如何使用表格将列读取为字符串,解决将表格将具有数字和字符串的列拆分为2个单独的列的问题?
代码:
pdf_files = 'mypdf.pdf'
df_list = tabula.read_pdf(pdf_files,pages='all',guess = False)
data_sheets = pd.DataFrame()
for idx,data in enumerate(df_list):
if idx == 0:
data_sheets = data_sheets.append(data)
headers = data.columns
else:
data = data.T.reset_index().T.reset_index(drop=True)
data.columns = headers
解决方法
Tabula不会尝试检测数据类型。布局检测的错误不是由数据类型引起的。
您可以尝试指定列边界的水平坐标。此参数显示在import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
from tensorflow.python.platform import gfile
pb_graph_file = '../data/processed/saved_models/saved_model.pb'
f = gfile.GFile(pb_graph_file,'rb')
graph_def = tf.GraphDef()
f.close()
方法的 """
This is the CNN model's architecture
"""
weight_decay = 1e-4
model = Sequential()
model.add(Conv2D(32,(3,3),activation = 'relu',kernel_initializer = 'he_normal',kernel_regularizer = l2(weight_decay),padding = 'same',input_shape = (32,32,3)))
model.add(BatchNormalization())
model.add(Conv2D(32,padding = 'same'))
model.add(BatchNormalization())
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.2))
model.add(Conv2D(64,padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(64,padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.3))
model.add(Conv2D(128,padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(128,2)))
model.add(Dropout(0.4))
# model.add(Conv2D(256,kernel_initializer = 'he_uniform',padding='same'))
# model.add(Conv2D(256,padding='same'))
# model.add(MaxPooling2D((2,2)))
model.add(Flatten())
# model.add(Dense(128,acti vation='relu',kernel_regularizer = l2(weight_decay)))
# model.add(BatchNormalization())
# model.add(Dropout(0.5))
# output layer
model.add(Dense(10,activation = 'softmax'))
# optimize and compile model
opt = Adam(learning_rate = 1e-3)
model.compile(optimizer = opt,loss = 'categorical_crossentropy',metrics = ['accuracy'])
return model
关键字参数的tabula-py
中。