在一行中的每个项目上循环并与另一行中的每个项目进行比较，然后将结果保存在新的column

问题描述

我想在python中循环执行，将一行中的每个项目与另一列中对应行中的其他项目进行比较。如果item在第二列的行中不存在，则应将其添加到将在另一列中转换的新列表中（如果我不在c中，则在添加时也应消除重复项）。

目标是将一列的每一行中的项目与另一列中的对应行中的项目进行比较，并将第一列中的唯一值保存在相同df的新列中。

这只是一个例子，我每行有很多项目

我尝试使用此代码，但是什么也没发生，并且将列表转换为列与我的测试不符

a= df['final_key_concat'].tolist()
b = df['attributes_tokenize'].tolist()
c = []
for i in df.values:
    for i in a:
        if i in a:
            if i not in b:
                if i not in c:
                    c.append(i)
                    print(c)
                    df['new'] = pd.Series(c)

任何帮助都是多余的，在此先感谢

解决方法

    def parse_str_into_list(s):
    if s.startswith('[') and s.endswith(']'):
        return ' '.join(s.strip('[]').strip("'").split("','"))
    return s

def filter_restrict_words(row):
    targets = parse_str_into_list(row[0]).split(' ',-1)
    restricts = parse_str_into_list(row[1]).split(' ',-1)
    print(restricts)

    # start for loop each words
    # use set type to save words or  list if we need to keep words in order
    words_to_keep = []
    for word in targets:
        # condition to keep eligible words
        if word not in restricts and 3 < len(word) < 45 and word not in words_to_keep:
            words_to_keep.append(word)
            print(words_to_keep)

    return ' '.join(words_to_keep)

df['FINAL_KEYWORDS'] = df[[col_target,col_restrict]].apply(lambda x: filter_restrict_words(x),axis=1)

因此，看到您拥有这两个变量的一种方式是：

a= df['final_key_concat'].tolist()
b = df['attributes_tokenize'].tolist()

尝试这样的事情：

new = {}
for index,items in enumerate(a):
    for thing in items:
        if thing not in b[index]:
            if index in new:
                new[index].append(thing)
            else:
                new[index] = [thing]

然后将字典映射到df。

df['new'] = df.index.map(new)

有更好的方法来做到这一点，但这应该可行。

这应该是您想要的：

import pandas as pd

data = {'final_key_concat':[['Camiseta','Tecnica','hombre','barate'],['deportivas','calcetin','hombres','deportivas','shoes']],'attributes_tokenize':[['The','North','Face','manga'],'shoes','North']]} #recreated from your image

df = pd.DataFrame(data)

a= df['final_key_concat'].tolist() #this generates a list of lists
b = df['attributes_tokenize'].tolist()#this also generates a list of lists
#Both list a and b need to be flattened so as to access their elements the way you want it
c = [itm for sblst in a for itm in sblst] #flatten list a using list comprehension
d = [itm for sblst in b for itm in sblst] #flatten list b using list comprehension

final_list = [itm for itm in c if itm not in d]#Sort elements common to both list c and d

print (final_list)

结果

['Camiseta','barate','hombres']

items loops python row

在一行中的每个项目上循环并与另一行中的每个项目进行比较，然后将结果保存在新的column_python中

问题描述

解决方法

相关问答