根据分组条件在csv中串联列值

问题描述

我的csv如下所示（注意：Name列中的值不受限制，即不仅限于ABC和DEF）：

Name,Type,Text 
ABC,Type A,how
ABC,are
ABC,you
ABC,Type B,Your
ABC,Name?
DEF,I
DEF,am
DEF,good
DEF,I'm
DEF,Terminator
... and more

我要创建另一个如下所示的csv文件（即，基于Text列的Type列中的Group Name列）：

Name,Text
ABC,how are you
ABC,Your Name?
DEF,I am good
DEF,I'm Terminator
..till the end

我正在尝试编写python脚本。我的尝试如下：

TypeList = ['Type A','Type B']
with open("../doc1.csv",encoding='utf-8',newline='',mode="r") as myfile:
    
    g = csv.reader(myfile)

    with open("../doc2.csv",mode="w") as myfile:
        h = csv.writer(myfile)
        h.writerow(["Name","Text"])

        for row in g:
            if TypeList[0] in row[1]:    
               Concatenatedtext[0]= Concatenatedtext[0] + ' ' + row[1]

有人可以帮我解决这个麻烦吗？

解决方法

将csv行分组在一起是itertools.groupby函数的任务。

itertools.groupby接受用于定义匹配行的键函数，并为找到的每个匹配项发出键（此处为名称和类型）和组（匹配的行）。

operator.itemgetter函数可用于创建键函数。

import csv
import itertools
import operator

# A function that gets the Name and Type values for each row:
# this is used to group the rows together.
key_func = operator.itemgetter(0,1)

with open('myfile.csv',newline='') as f:
    reader = csv.reader(f)
    # Skip header row
    next(reader)
    for key,group in itertools.groupby(reader,key=key_func):
        text = ' '.join(cell[2] for cell in group)
        print([key[0],key[1],text])

输出：

['ABC',' Type A',' how  are  you']
['ABC',' Type B',' Your  Name?']
['DEF',' I  am  good']
['DEF'," I'm  Terminator"]

concatenation csv csv python string-concatenation text-manipulation