Reoder系列基于值索引的行

问题描述

我有一个大熊猫系列,所以优化是关键

pd.Series(['I like apples','They went skiing vacation','Apples are tasty','The skiing was great'],dtype='string')

0                I like apples
1    They went skiing vacation
2             Apples are tasty
3         The skiing was great
dtype: string

考虑这些行是字符串列表,即第0行是['I','like','apples']。

我想获取诸如“ apples”的索引,并根据该索引的值对行进行重新排序。在此示例中,系列看起来像:

2             Apples are tasty
0                I like apples
1    They went skiing vacation
3         The skiing was great
dtype: string

因为第2行中“苹果”的索引(忽略大小写)为0。

解决方法

使用Series.str.contains

#create DataFrame by split and reshape
s1 = s.str.split(expand=True).stack()
#filter only matched apple rows,sorting by second level (possition of apples)
idx  = s1[s1.str.contains('apples',case=False)].sort_index(level=1).index

#get original index by uion and select by loc for change ordering
s = s.loc[idx.remove_unused_levels().levels[0].union(s.index,sort=False)]
print (s)
2             Apples are tasty
0                I like apples
1    They went skiing vacation
3         The skiing was great
dtype: string

具有列表理解和枚举的另一种想法:

a = [next(iter(i for i,j in enumerate(x.split()) if j.lower() == 'apples'),len(s)*10) for x in s]
print (a)
[2,40,40]

s = s.loc[np.array(a).argsort()]
print (s)
2             Apples are tasty
0                I like apples
1    They went skiing vacation
3         The skiing was great
dtype: string