根据分组条件在csv中串联列值

问题描述

我的csv如下所示(注意:Name列中的值不受限制,即不仅限于ABCDEF):

Name,Type,Text 
ABC,Type A,how
ABC,are
ABC,you
ABC,Type B,Your
ABC,Name?
DEF,I
DEF,am
DEF,good
DEF,I'm
DEF,Terminator
... and more 

我要创建另一个如下所示的csv文件(即,基于Text列的Type列中的Group Name列):

Name,Text
ABC,how are you
ABC,Your Name?
DEF,I am good
DEF,I'm Terminator
..till the end

我正在尝试编写python脚本。我的尝试如下:

TypeList = ['Type A','Type B']
with open("../doc1.csv",encoding='utf-8',newline='',mode="r") as myfile:
    
    g = csv.reader(myfile)

    with open("../doc2.csv",mode="w") as myfile:
        h = csv.writer(myfile)
        h.writerow(["Name","Text"])

        for row in g:
            if TypeList[0] in row[1]:    
               Concatenatedtext[0]= Concatenatedtext[0] + ' ' + row[1]

有人可以帮我解决这个麻烦吗?

解决方法

将csv行分组在一起是itertools.groupby函数的任务。

itertools.groupby接受用于定义匹配行的键函数,并为找到的每个匹配项发出键(此处为名称和类型)和组(匹配的行)。

operator.itemgetter函数可用于创建键函数。

import csv
import itertools
import operator

# A function that gets the Name and Type values for each row:
# this is used to group the rows together.
key_func = operator.itemgetter(0,1)

with open('myfile.csv',newline='') as f:
    reader = csv.reader(f)
    # Skip header row
    next(reader)
    for key,group in itertools.groupby(reader,key=key_func):
        text = ' '.join(cell[2] for cell in group)
        print([key[0],key[1],text])

输出:

['ABC',' Type A',' how  are  you']
['ABC',' Type B',' Your  Name?']
['DEF',' I  am  good']
['DEF'," I'm  Terminator"]