仅当两个单元格都不为空时,才在两列值之间添加“,”

问题描述

我有以下数据框:

>>>name   breakfast  lunch   dinner
0 Zoey    apple      egg     noodels
1 Rena    pear               pasta
2 Shila             tomato  potatoes
3 Daphni coffee             soup 
4 Dufi                  

我想创建一个新列,其中将包含每个名字在同一天吃的所有食物值。我尝试使用'+'并用','分隔单词,如下所示:

df['food']=df['breakfast']+','+df['lunch']+','+df['dinner']

但是如果我有空值,我在中间会有',':


>>>name   breakfast  lunch   dinner     food
0 Zoey    apple      egg     noodels    apple,egg,noodels
1 Rena    pear               pasta      pear,pasta
2 Shila             tomato  potatoes,tmatoe,potatoes
3 Daphni coffee             soup       coffee,soupp
4. Dufi,

并且我想在正确的位置用','使其整洁,例如,如果没有null,则不要放:

>>>name   breakfast  lunch   dinner     food
0 Zoey    apple      egg     noodels    apple,pasta
2 Shila             tomato  potatoes    tmatoe,soup
4 Dufi                  

有什么办法做到这一点?定义是否存在空单元格,请勿将其添加/不要放在错误的位置

解决方法

在索引上使用.stackgroupby

假设您的空格实际上是空值

因为我们不想要名称,所以我们可以将其添加到索引中或将其删除,我已经在此处添加了它。

df['food'] = df.set_index('name',append=True).stack().groupby(level=0).agg(','.join)

如果您的空格不为空,我们可以

df.replace(' ',np.nan).set_index('name',append=True).stack()\
                       .groupby(level=0).agg(','.join)

    name breakfast     lunch   dinner               food
0    Zoey     apple       egg  noodels  apple,egg,noodels
1    Rena      pear     pasta      NaN         pear,pasta
2   Shila    tomato  potatoes      NaN    tomato,potatoes
3  Daphni    coffee      soup      NaN        coffee,soup
4    Dufi       NaN       NaN      NaN                NaN
,

如果没有缺失值的解决方案,则仅连接空字符串,仅过滤空字符串的值:

cols = ['breakfast','lunch','dinner']
df['food'] = df[cols].apply(lambda x: ','.join(y for y in x if y != ''),axis=1)
print (df)
     name breakfast   lunch    dinner               food
0    Zoey     apple     egg   noodels  apple,noodels
1    Rena      pear             pasta         pear,pasta
2   Shila            tomato  potatoes    tomato,potatoes
3  Daphni    coffee              soup        coffee,soup
4   Dufi                                                

或者具有列表理解功能:

cols = ['breakfast','dinner']
df['food'] = [','.join(y for y in x if y != '') for x in df[cols].to_numpy()]
print (df)
     name breakfast   lunch    dinner               food
0    Zoey     apple     egg   noodels  apple,soup
4   Dufi                                                

如果缺失值相似,则仅使用NaN != NaN

cols = ['breakfast','.join(y for y in x if y == y) for x in df[cols].to_numpy()]
print (df)
     name breakfast   lunch    dinner               food
0    Zoey     apple     egg   noodels  apple,noodels
1    Rena      pear     NaN     pasta         pear,pasta
2   Shila       NaN  tomato  potatoes    tomato,potatoes
3  Daphni    coffee     NaN      soup        coffee,soup
4   Dufi        NaN     NaN       NaN