问题描述
所以我有一个有多个标签的 Excel 工作表,每个单独的标签都有多个表。所以我想以这样的方式读取文件,例如从工作表的每个选项卡中读取每个表格,
Tab1 has five tables in it.
Tab2 has Ten tables in it.
.....
.....
我想在 pandas 数据框中读取这些表中的每一个,然后将其保存到 sql 数据库中。我知道如何从 Excel 工作表中读取多个标签。
任何人都可以在这里帮助我或为我指明一个可以找到潜在客户的方向吗?
选项卡中的表是预定义的并具有名称。这就是每个标签中的样子 Tab from excel sheet
解决方法
您可能需要调整它以匹配您的数据;想象一下,如果你有一些下表和一些上面。希望这应该为您指明正确的方向。另外,请注意我使用的 for 循环数;我相信你可以做得更好并进一步优化。
from openpyxl import load_workbook
from collections import defaultdict
from itertools import product,groupby
from operator import itemgetter
wb = load_workbook(filename="test.xlsx")
sheet = wb["Sheet1"]
green_rows = defaultdict(list)
rest_data = []
for row in sheet:
for cell in row:
look for the green rows; they contain the headers
if cell.fill.fgColor.rgb == "FFA2D722":
# take advantage of the fact that header
# is the first entry in that row
if cell.value:
val = cell.value
green_rows[(val,cell.row)].append(cell.column)
else:
if cell.value not in (None,""): # so the 0s are not lost
rest_data.append((cell.row,cell.column,cell.value))
# get the max and minimum column positions
# note the addition of 1 to the max,# this is necessary when iterating to sort the data
# in the next section
green_rows = [
(name,row,range(min(value),max(value) + 1))
for (name,row),value in green_rows.items()
]
box = []
# here the green rows and the rest of the data
# are combined,then filtered for the respective
# sections
combo = product(green_rows,rest_data)
for (header,header_row,header_column_range),(
cell_row,cell_column,cell_value,) in combo:
# this is where the filtration occurs
if (header_row < cell_row) and (cell_column in header_column_range):
box.append((header,cell_row,cell_value))
final = defaultdict(list)
content = groupby(box,itemgetter(1,0))
# another iteration to get the final result
for key,value in content:
final[key[-1]].append([val[-1] for val in value])
您可以为每个标题创建数据框:
pd.DataFrame(final["Address Association"])
0 1 2 3 4 5
0 Column Name in DB Name Description SortOrder BusinessMeaningName Obsolete
1 Field Type nvarchar(100) nvarchar(255) int nvarchar(50) bit
2 Mandatory Yes Yes Yes No Yes
3 Foreign Key - - - - -
4 Optional Feature - - - - -
5 Field Name in U4SM Name Description Sort Order Business Meaning Name Obsolete
6 Address.Primary Primary Use this address by default. 1 Address.Primary 0
7 Address.Billing Billing address for billing. 2 Address.Billing 0
8 Address.Emergency Emergency use this for emergency. 3 Address.Emergency 0
9 Address.Emergency SMS Emergency SMS use this for emergency SMS. 4 Address.Emergency SMS 0
10 Address.Deceased Deceased address for deceased. 5 Address.Deceased 0
11 Address.Home Home address for home. 8 Address.Home 0
12 Address.Mailing Mailing address for mailing. 9 Address.Mailing 0
13 Address.Mobile Mobile use this for mobile. 10 Address.Mobile 0
14 Address.School School address for school. 13 Address.School 0
15 Address.SMS SMS use this for SMS text. 15 Address.SMS 0
16 Address.Work Work address for work 16 Address.Work 0
17 Address.Permanent Permanent Permanent Address 17 Address.Permanent 0
18 Address.HallsOfResidence Halls of Residence Halls of Residence 18 Address.HallsOfResidence 0