python-将多索引DataFrame的所有列乘以Series中的适当值

我觉得这应该很明显,但是我有点卡住了.

我在行上有一个带有3级MultiIndex的DataFrame(df). MultiIndex的级别之一是ccy,代表表示该行中包含的信息的货币.每行有3列数据.

我想将所有数据转换为参考货币(比如说美元).为此,我有一个系列(forex),其中包含相关货币的汇率.

因此目标很简单:将df每行中的所有数据乘以与df中该行索引的ccy条目对应的外汇值.

机械设置如下所示:

import pandas as pd
import numpy as np
import itertools

np.random.seed(0)

tuples = list(itertools.product(
                                list('abd'), 
                                ['one', 'two', 'three'], 
                                ['USD', 'EUR', 'GBP']
                                ))

np.random.shuffle(tuples)

idx = pd.MultiIndex.from_tuples(tuples[:-10], names=['letter', 'number', 'ccy'])

df = pd.DataFrame(np.random.randn(len(idx), 3), index=idx,
                  columns=['val_1', 'val_2', 'val_3'])

forex = pd.Series({'USD': 1.0,
                   'EUR': 1.3,
                   'GBP': 1.7})

我可以通过运行以下命令获得所需的信息:

df.apply(lambda col: col.mul(forex, level='ccy'), axis=0)

但是对我来说似乎很奇怪,在这种简单情况下我需要使用pd.DataFrame.apply.我希望以下语法(或类似的语法)能起作用:

df.mul(forex, level='ccy', axis=0)

但这给了我:

ValueError: cannot reindex from a duplicate axis

显然,apply方法不是灾难.但是似乎很奇怪,我无法弄清楚使用mul在所有列上直接执行此操作的语法.有没有更直接的方法解决这个问题?如果不是,是否出于直觉原因不应该增强mul语法以这种方式工作?

解决方法:

现在可以在master / 0.14中使用.查看问题:https://github.com/pydata/pandas/pull/6682

In [11]: df.mul(forex,level='ccy',axis=0)
Out[11]: 
                      val_1     val_2     val_3
letter number ccy                              
a      one    GBP -2.172854  2.443530 -0.132098
d      three  USD  1.089630  0.096543  1.418667
b      two    GBP  1.986064  1.610216  1.845328
       three  GBP  4.049782 -0.690240  0.452957
a      two    GBP -2.304713 -0.193974 -1.435192
b      one    GBP  1.199589 -0.677936 -1.406234
d      two    GBP -0.706766 -0.891671  1.382272
b      two    EUR -0.298026  2.810233 -1.244011
d      one    EUR  0.087504  0.268448 -0.593946
              GBP -1.801959  1.045427  2.430423
b      three  EUR -0.275538 -0.104438  0.527017
a      one    EUR  0.154189  1.630738  1.844833
b      one    EUR -0.967013 -3.272668 -1.959225
d      three  GBP  1.953429 -2.029083  1.939772
              EUR  1.962279  1.388108 -0.892566
a      three  GBP  0.025285 -0.638632 -0.064980
              USD  0.367974 -0.044724 -0.302375

[17 rows x 3 columns]

这是另一种方法(也需要master / 0.14)

In [127]: df = df.sortlevel()

In [128]: df
Out[128]: 
                      val_1     val_2     val_3
letter number ccy                              
a      one    EUR  0.118607  1.254414  1.419102
              GBP -1.278149  1.437371 -0.077705
       three  GBP  0.014873 -0.375666 -0.038224
              USD  0.367974 -0.044724 -0.302375
       two    GBP -1.355714 -0.114103 -0.844231
b      one    EUR -0.743856 -2.517437 -1.507096
              GBP  0.705641 -0.398786 -0.827197
       three  EUR -0.211952 -0.080337  0.405398
              GBP  2.382224 -0.406024  0.266445
       two    EUR -0.229251  2.161717 -0.956931
              GBP  1.168273  0.947186  1.085487
d      one    EUR  0.067311  0.206499 -0.456881
              GBP -1.059976  0.614957  1.429661
       three  EUR  1.509445  1.067775 -0.686589
              GBP  1.149076 -1.193578  1.141042
              USD  1.089630  0.096543  1.418667
       two    GBP -0.415745 -0.524512  0.813101

[17 rows x 3 columns]

idx = pd.IndexSlice

In [129]: pd.concat([ df.loc[idx[:,:,x],:]*v for x,v in forex.iteritems() ])
Out[129]: 
                      val_1     val_2     val_3
letter number ccy                              
a      one    EUR  0.154189  1.630738  1.844833
b      one    EUR -0.967013 -3.272668 -1.959225
       three  EUR -0.275538 -0.104438  0.527017
       two    EUR -0.298026  2.810233 -1.244011
d      one    EUR  0.087504  0.268448 -0.593946
       three  EUR  1.962279  1.388108 -0.892566
a      one    GBP -2.172854  2.443530 -0.132098
       three  GBP  0.025285 -0.638632 -0.064980
       two    GBP -2.304713 -0.193974 -1.435192
b      one    GBP  1.199589 -0.677936 -1.406234
       three  GBP  4.049782 -0.690240  0.452957
       two    GBP  1.986064  1.610216  1.845328
d      one    GBP -1.801959  1.045427  2.430423
       three  GBP  1.953429 -2.029083  1.939772
       two    GBP -0.706766 -0.891671  1.382272
a      three  USD  0.367974 -0.044724 -0.302375
d      three  USD  1.089630  0.096543  1.418667

[17 rows x 3 columns]

这是通过合并的另一种方式

In [36]: f = forex.to_frame('value')

In [37]: f.index.name =  'ccy'

In [38]: pd.merge(df.reset_index(),f.reset_index(),on='ccy')
Out[38]: 
   letter number  ccy     val_1     val_2     val_3  value
0       a    one  GBP -1.278149  1.437371 -0.077705    1.7
1       b    two  GBP  1.168273  0.947186  1.085487    1.7
2       b  three  GBP  2.382224 -0.406024  0.266445    1.7
3       a    two  GBP -1.355714 -0.114103 -0.844231    1.7
4       b    one  GBP  0.705641 -0.398786 -0.827197    1.7
5       d    two  GBP -0.415745 -0.524512  0.813101    1.7
6       d    one  GBP -1.059976  0.614957  1.429661    1.7
7       d  three  GBP  1.149076 -1.193578  1.141042    1.7
8       a  three  GBP  0.014873 -0.375666 -0.038224    1.7
9       d  three  USD  1.089630  0.096543  1.418667    1.0
10      a  three  USD  0.367974 -0.044724 -0.302375    1.0
11      b    two  EUR -0.229251  2.161717 -0.956931    1.3
12      d    one  EUR  0.067311  0.206499 -0.456881    1.3
13      b  three  EUR -0.211952 -0.080337  0.405398    1.3
14      a    one  EUR  0.118607  1.254414  1.419102    1.3
15      b    one  EUR -0.743856 -2.517437 -1.507096    1.3
16      d  three  EUR  1.509445  1.067775 -0.686589    1.3

[17 rows x 7 columns]

相关文章

转载:一文讲述Pandas库的数据读取、数据获取、数据拼接、数...
Pandas是一个开源的第三方Python库,从Numpy和Matplotlib的基...
整体流程登录天池在线编程环境导入pandas和xrld操作EXCEL文件...
 一、numpy小结             二、pandas2.1为...
1、时间偏移DateOffset对象DateOffset类似于时间差Timedelta...
1、pandas内置样式空值高亮highlight_null最大最小值高亮背景...