问题描述
我有一个包含几个数字的系列。我想通过使用字典值将它们替换为其他字符串类型的数据。但是我不知道该怎么做...
GDP_group['GdpForYearPer$1M'].head(5)
0 46.919625
1 47.515189
2 47.737955
3 54.832578
4 56.338028
5 63.101272 \
这是我替换数据的命令。
range_GDP = {'$0 ~ $100M': np.arange(0,100),'$100M ~ $1B': np.arange(100.0000001,1000),'$1B ~ $10B': np.arange(1000.000001,10000),'$10B ~ $100B': np.arange(10000.000001,100000),'$100B ~ $1T': np.arange(100000.000001,1000000),'$1T ~': np.arange(1000000.000001,20000000)}
解决方法
您可以使用pd.cut
在范围内细分数据并应用标签。
(重新)生成在日志空间中统一采样的伪数据:
import numpy as np
import pandas as pd
GdpForYearPer1M = pd.Series(10**np.random.randint(0,8,100))
"""
0 1
1 1000
2 100
3 10
4 100
...
95 1000000
96 100
97 100000
98 10000
99 10
"""
解决方案:
# generate "cuts" (bins) and associated labels from `range_GDP`.
cut_data = [(np.min(v),k) for k,v in range_GDP.items()]
bins,labels = zip(*cut_data)
# bins required to have one more value than labels
bins = list(bins) + [np.inf]
pd.cut(GdpForYearPer1M,bins=bins,labels=labels)
输出:
0 $0 ~ $100M
1 $100M ~ $1B
2 $0 ~ $100M
3 $0 ~ $100M
4 $0 ~ $100M
...
95 $100B ~ $1T
96 $0 ~ $100M
97 $10B ~ $100B
98 $1B ~ $10B
99 $0 ~ $100M
Length: 100,dtype: category
Categories (6,object): [$0 ~ $100M < $100M ~ $1B < $1B ~ $10B < $10B ~ $100B < $100B ~ $1T < $1T ~]