python – groupby.first()和groupby.head(1)之间有什么区别?

两者都返回每组第一行的DataFrame.在阅读API参考时,它首先说“计算第一组值”,但是当并排查看两个输出时,我看不到主要区别.

我错过了什么吗?

df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7],
                    'value'  : ["first","second","second","first",
                                "second","first","third","fourth",
                                "fifth","second","fifth","first",
                                "first","second","third","fourth","fifth"]})

First API

解决方法:

主要区别在于first()将跳转到第一个非null值,而head(1)则不会.

如果我把np.nan放到你的例子中:

df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7],
                   'value'  : [np.nan,"second","second","first",
                               "second","first","third","fourth",
                               "fifth","second","fifth","first",
                               "first","second","third","fourth","fifth"]})

然后我们有:

>>> df.groupby('id').head(1)
    id   value
0    1     NaN      # NaN is included
3    2   first
5    3   first
9    4  second
11   5   first
12   6   first
15   7  fourth

>>> df.groupby('id').first()
     value
id        
1   second          # NaN is skipped
2    first
3    first
4   second
5    first
6    first
7   fourth

(另外,如您所见,head()重置索引.)

相关文章

转载:一文讲述Pandas库的数据读取、数据获取、数据拼接、数...
Pandas是一个开源的第三方Python库,从Numpy和Matplotlib的基...
整体流程登录天池在线编程环境导入pandas和xrld操作EXCEL文件...
 一、numpy小结             二、pandas2.1为...
1、时间偏移DateOffset对象DateOffset类似于时间差Timedelta...
1、pandas内置样式空值高亮highlight_null最大最小值高亮背景...