Python - 在 Python 中从 R 复制 mapply() 功能

问题描述

我使用 mapply() 计算了 R 中数据集行之一中特征值的百分位数。 这是 R 代码

library(MASS)
boston = Boston
# Suburb(s) with lowest median home value
low.medv <- boston[boston$medv == min(boston$medv),]
low.medv
#        crim zn indus chas   nox    rm age    dis rad tax ptratio  black lstat medv
# 399 38.3518  0  18.1    0 0.693 5.453 100 1.4896  24 666    20.2 396.90 30.59    5
# 406 67.9208  0  18.1    0 0.693 5.683 100 1.4254  24 666    20.2 384.97 22.98    5

# quantile ranks for the values of medv suburbs
perc = data.frame(round(mapply(function(x,y) ecdf(x)(y),boston,low.medv1),3),row.names = paste(rownames(low.medv1),'_P',sep = ""))

期望输出

perc
#        crim    zn indus  chas   nox    rm age   dis rad  tax ptratio black lstat  medv
# 399_P 0.988 0.735 0.887 0.931 0.858 0.077   1 0.057   1 0.99   0.889  1.00 0.978 0.004
# 406_P 0.996 0.735 0.887 0.931 0.858 0.136   1 0.042   1 0.99   0.889  0.35 0.899 0.004

问题:
我正在尝试在 python 中复制它。

  1. 如何在 python 中复制上述内容
  2. python 中是否有与 mapply() 功能相同的函数

这里是重现的python代码

import pandas as pd
import numpy as np
from scipy.stats import percentileofscore as ptile
from sklearn.datasets import load_boston
boston = load_boston()
df = pd.DataFrame(boston.data,columns=boston.feature_names)
df['medv'] = boston.target
df.columns

# Suburb(s) with lowest median home value
low_medv = df[df.medv == min(df.medv)]
low_medv

可以用两个 for 循环来完成:

perc = pd.DataFrame()
for c in low_medv.columns:
    for i in low_medv.index:
        perc.loc[i,c] = round(ptile(df[c],low_medv.loc[i,c]),3)
# ptile calculates the percentile rank of a given value
perc

但这是最有效的方法吗?

解决方法

熊猫有rank

# min(df.medv) is not vectorized
low_medv = df[df.medv == df.medv.min()]

(df.rank(method='average',pct=True)
   .loc[low_medv.index]
   .mul(100).round(3)
)

输出:

       CRIM      ZN   INDUS   CHAS     NOX      RM    AGE    DIS     RAD     TAX  PTRATIO       B   LSTAT   medv
398  98.814  36.858  75.791  46.64  84.486   7.708  95.85  5.731  87.055  86.067   75.198  88.142  97.826  0.296
405  99.605  36.858  75.791  46.64  84.486  13.636  95.85  4.150  87.055  86.067   75.198  34.980  89.921  0.296