如何在Pandas中连接两个字符串列,但从第二个字符串中排除重复的单词?

问题描述

这是东西,我有类似的东西

col1                          col2
This is a blue book           blue book above
This is a green ball          this is a ball
What is your name             blue book above

我想这样创建col3:

col1                          col2                  col3
this is a blue book           blue book above       this is a blue book above
this is a green ball          this is a ball        this is a green ball
what is your name             blue book above       what is your name blue book above

我找不到使这项工作成功的方法

解决方法

在Pandas中连接两个字符串列,但不包括重复的单词 从第二个开始

尝试一下:

def f(r):
  c1,c2 = r
  s1 = c1.split(" ")
  s2 = c2.split(" ")
  s3 = [s for s in s2 if s not in s1]
  return c1+" "+" ".join(s3)
df["col3"] = df[["col1","col2"]].apply(f,axis=1)
df
                   col1               col2                               col3
0   this is a blue book    blue book above          this is a blue book above
1   this is a green ball    this is a ball               this is a green ball
2   what is your name      blue book above  what is your name blue book above