比较来自另一列的 python pandas 中的缺失值

问题描述

我有一个 Pandas 数据框，它由两列带值组成。有些值丢失了，我想创建第三列，标记两列中是否都缺少值或是否填充了一个值。我不确定如何执行此操作，因为我是新手，如果您能提供任何帮助，我们将不胜感激

#input 
df = {'First': ['','','A','B','C'],'Second': ['12','10','11']}
df = pd.DataFrame(data = d)

#Possible output of third column
df['Third'] = ['Secondfilled','missing','bothfilled','Firstfilled',bothfilled']

解决方法

没有 ifelse 和自定义函数的单行解决方案。改进了@SeaBean 的建议！

df1 <- df1 %>% 
  group_by(country) %>%
  mutate(tot_sales = sum(sales)) %>%
  ungroup() %>%
  slice_max(n = 2,tot_sales)

输出：

d = {0: 'Missing',1: 'FirstFilled',2: 'SecondFilled',3: 'BothFilled'}
df['Third'] = (df.ne('')*(1,2)).sum(1).map(d)

您可以将 apply() 与查找字典一起使用。

lookup = {'10': 'Firstfilled','01': 'Secondfilled','11': 'bothfilled','00': 'missing'}

def fill(row):
    key = '00'

    if row['First'] != '':
        key = '1' + key[1]

    if row['Second'] != '':
        key = key[0] + '1'

    return lookup[key]

df['Third'] = df.apply(fill,axis=1)

# print(df)

  First Second         Third
0           12  Secondfilled
1                    missing
2     A     10    bothfilled
3     B          Firstfilled
4     B          Firstfilled
5     C     11    bothfilled

compare compare missing-data pandas pandas python