问题描述
我有以下数据框:
>>>name breakfast lunch dinner
0 Zoey apple egg noodels
1 Rena pear pasta
2 Shila tomato potatoes
3 Daphni coffee soup
4 Dufi
我想创建一个新列,其中将包含每个名字在同一天吃的所有食物值。我尝试使用'+'并用','分隔单词,如下所示:
df['food']=df['breakfast']+','+df['lunch']+','+df['dinner']
但是如果我有空值,我在中间会有',':
>>>name breakfast lunch dinner food
0 Zoey apple egg noodels apple,egg,noodels
1 Rena pear pasta pear,pasta
2 Shila tomato potatoes,tmatoe,potatoes
3 Daphni coffee soup coffee,soupp
4. Dufi,
并且我想在正确的位置用','使其整洁,例如,如果没有null,则不要放:
>>>name breakfast lunch dinner food
0 Zoey apple egg noodels apple,pasta
2 Shila tomato potatoes tmatoe,soup
4 Dufi
有什么办法做到这一点?定义是否存在空单元格,请勿将其添加/不要放在错误的位置
解决方法
在索引上使用.stack
和groupby
。
假设您的空格实际上是空值
因为我们不想要名称,所以我们可以将其添加到索引中或将其删除,我已经在此处添加了它。
df['food'] = df.set_index('name',append=True).stack().groupby(level=0).agg(','.join)
如果您的空格不为空,我们可以
df.replace(' ',np.nan).set_index('name',append=True).stack()\
.groupby(level=0).agg(','.join)
name breakfast lunch dinner food
0 Zoey apple egg noodels apple,egg,noodels
1 Rena pear pasta NaN pear,pasta
2 Shila tomato potatoes NaN tomato,potatoes
3 Daphni coffee soup NaN coffee,soup
4 Dufi NaN NaN NaN NaN
,
如果没有缺失值的解决方案,则仅连接空字符串,仅过滤空字符串的值:
cols = ['breakfast','lunch','dinner']
df['food'] = df[cols].apply(lambda x: ','.join(y for y in x if y != ''),axis=1)
print (df)
name breakfast lunch dinner food
0 Zoey apple egg noodels apple,noodels
1 Rena pear pasta pear,pasta
2 Shila tomato potatoes tomato,potatoes
3 Daphni coffee soup coffee,soup
4 Dufi
或者具有列表理解功能:
cols = ['breakfast','dinner']
df['food'] = [','.join(y for y in x if y != '') for x in df[cols].to_numpy()]
print (df)
name breakfast lunch dinner food
0 Zoey apple egg noodels apple,soup
4 Dufi
如果缺失值相似,则仅使用NaN != NaN
:
cols = ['breakfast','.join(y for y in x if y == y) for x in df[cols].to_numpy()]
print (df)
name breakfast lunch dinner food
0 Zoey apple egg noodels apple,noodels
1 Rena pear NaN pasta pear,pasta
2 Shila NaN tomato potatoes tomato,potatoes
3 Daphni coffee NaN soup coffee,soup
4 Dufi NaN NaN NaN