如何从 Python 中 Excel 工作表的每个选项卡中读取多个表？

问题描述

所以我有一个有多个标签的 Excel 工作表，每个单独的标签都有多个表。所以我想以这样的方式读取文件，例如从工作表的每个选项卡中读取每个表格，

Tab1 has five tables in it.
Tab2 has Ten tables in it.
.....
.....

我想在 pandas 数据框中读取这些表中的每一个，然后将其保存到 sql 数据库中。我知道如何从 Excel 工作表中读取多个标签。

任何人都可以在这里帮助我或为我指明一个可以找到潜在客户的方向吗？

选项卡中的表是预定义的并具有名称。这就是每个标签中的样子 Tab from excel sheet

解决方法

您可能需要调整它以匹配您的数据；想象一下，如果你有一些下表和一些上面。希望这应该为您指明正确的方向。另外，请注意我使用的 for 循环数；我相信你可以做得更好并进一步优化。

from openpyxl import load_workbook
from collections import defaultdict
from itertools import product,groupby
from operator import itemgetter

wb = load_workbook(filename="test.xlsx")

sheet = wb["Sheet1"]

green_rows = defaultdict(list)
rest_data = []

for row in sheet:
    for cell in row:
        look for the green rows; they contain the headers
        if cell.fill.fgColor.rgb == "FFA2D722":
            # take advantage of the fact that header 
            # is the first entry in that row
            if cell.value:
                val = cell.value
            green_rows[(val,cell.row)].append(cell.column)
        else:
            if cell.value not in (None,""): # so the 0s are not lost
                rest_data.append((cell.row,cell.column,cell.value))

# get the max and minimum column positions
# note the addition of 1 to the max,# this is necessary when iterating to sort the data
# in the next section
green_rows = [
    (name,row,range(min(value),max(value) + 1))
    for (name,row),value in green_rows.items()
]


box = []

# here the green rows and the rest of the data
# are combined,then filtered for the respective 
# sections
combo = product(green_rows,rest_data)
for (header,header_row,header_column_range),(
    cell_row,cell_column,cell_value,) in combo:
    # this is where the filtration occurs
    if (header_row < cell_row) and (cell_column in header_column_range):
        box.append((header,cell_row,cell_value))

final = defaultdict(list)
content = groupby(box,itemgetter(1,0))

# another iteration to get the final result
for key,value in content:
    final[key[-1]].append([val[-1] for val in value])

您可以为每个标题创建数据框：

pd.DataFrame(final["Address Association"])


0   1   2   3   4   5
0   Column Name in DB   Name    Description SortOrder   BusinessMeaningName Obsolete
1   Field Type  nvarchar(100)   nvarchar(255)   int nvarchar(50)    bit
2   Mandatory   Yes Yes Yes No  Yes
3   Foreign Key -   -   -   -   -
4   Optional Feature    -   -   -   -   -
5   Field Name in U4SM  Name    Description Sort Order  Business Meaning Name   Obsolete
6   Address.Primary Primary Use this address by default.    1   Address.Primary 0
7   Address.Billing Billing address for billing.    2   Address.Billing 0
8   Address.Emergency   Emergency   use this for emergency. 3   Address.Emergency   0
9   Address.Emergency SMS   Emergency SMS   use this for emergency SMS. 4   Address.Emergency SMS   0
10  Address.Deceased    Deceased    address for deceased.   5   Address.Deceased    0
11  Address.Home    Home    address for home.   8   Address.Home    0
12  Address.Mailing Mailing address for mailing.    9   Address.Mailing 0
13  Address.Mobile  Mobile  use this for mobile.    10  Address.Mobile  0
14  Address.School  School  address for school. 13  Address.School  0
15  Address.SMS SMS use this for SMS text.  15  Address.SMS 0
16  Address.Work    Work    address for work    16  Address.Work    0
17  Address.Permanent   Permanent   Permanent Address   17  Address.Permanent   0
18  Address.HallsOfResidence    Halls of Residence  Halls of Residence  18  Address.HallsOfResidence    0

dataframe excel excel pandas pandas python worksheet

如何从 Python 中 Excel 工作表的每个选项卡中读取多个表？

问题描述

解决方法

相关问答