转换数据框以按自变量缩放

问题描述

所以我有一个如下所示的数据框:

            year    vm  Gebied      Naam stadsdeel  Wijk                    variable                value
        0   2014    T94 Zuidoost    Bijlmer-Oost    Bijlmer Oost (E,G,K)    ipsl_risicoperceptie    93.891900
        1   2015    T94 Zuidoost    Bijlmer-Oost    Bijlmer Oost (E,K)    ipsl_risicoperceptie    94.510000
        2   2016    T94 Zuidoost    Bijlmer-Oost    Bijlmer Oost (E,K)    ipsl_risicoperceptie    97.575000
        3   2017    T94 Zuidoost    Bijlmer-Oost    Bijlmer Oost (E,K)    ipsl_risicoperceptie    96.877500
        4   2018    T94 Zuidoost    Bijlmer-Oost    Bijlmer Oost (E,K)    ipsl_risicoperceptie    97.175000
        5   2019    T94 Zuidoost    Bijlmer-Oost    Bijlmer Oost (E,K)    ipsl_risicoperceptie    100.487500
        6   2014    A08 Centrum     Centrum-Oost    Weesperbuurt/Plantage   ipsl_risicoperceptie    97.394115
        7   2015    A08 Centrum     Centrum-Oost    Weesperbuurt/Plantage   ipsl_risicoperceptie    96.160000
        8   2016    A08 Centrum     Centrum-Oost    Weesperbuurt/Plantage   ipsl_risicoperceptie    97.750000
        9   2017    A08 Centrum     Centrum-Oost    Weesperbuurt/Plantage   ipsl_risicoperceptie    98.820000

有多个变量,每个变量都有不同的尺度以及每年的最小值和最大值。我想转换这些值,使其每年匹配 1-100 的比例。因此该变量的最小值为 1,该年该变量的最大值为 100。您可以在下面看到变量的描述。

[12]    df.groupby(['variable','year']).describe()

                    count   mean        std         min         25%         50%         75%         max  
variable    year                                
HICindex    2014    94.0    92.282244   26.901022   31.602504   75.826290   91.257552   111.136273  157.578866
            2015    94.0    90.381516   29.872063   16.600000   70.397500   83.555000   108.947500  169.840000
            2016    97.0    84.735893   27.558587   29.480000   63.180000   84.490000   103.760000  169.600000
            2017    97.0    81.702208   26.291037   22.490000   59.990000   82.510000   95.110000   159.820000
            2018    97.0    84.484390   28.148936   26.330000   68.710000   78.710000   96.960000   170.170000
            2019    97.0    80.629880   26.166200   26.530000   64.170000   76.340000   99.140000   167.383333
HVCIndex    2014    94.0    102.252289  29.787177   53.111797   84.784686   100.954751  114.647216  214.428036
            2015    94.0    96.295904   28.732603   34.280000   78.850000   94.195000   114.662500  199.820000
            2016    97.0    92.988050   29.093444   44.560000   74.410000   88.240000   110.220000  187.810000
            2017    97.0    86.563471   28.395480   33.730000   69.100000   82.060000   100.410000  195.920000
            2018    97.0    77.429003   29.287222   18.580000   61.050000   73.590000   88.950000   216.780000
            2019    97.0    80.240825   30.354648   19.610000   61.795000   76.830000   90.700000   239.510000
Ioverlast   2014    94.0    110.555446  59.265498   27.438722   83.847542   102.407453  122.734040  472.941648
            2015    94.0    107.138076  60.195996   30.640000   77.475000   98.160000   120.705000  480.100000
            2016    97.0    112.180086  68.316333   34.140000   81.300000   102.480000  117.910000  547.700000
            2017    97.0    113.205696  67.792895   31.080000   77.510000   102.310000  120.880000  498.190000
            2018    97.0    108.790326  71.469753   36.500000   72.070000   98.550000   116.740000  537.760000
            2019    97.0    113.551065  66.994786   22.630000   81.410000   106.540000  125.210000  507.910000

正如您在上面看到的所有变量(总共 3/7)。都有不同的范围。我想对每年每个变量的范围进行标准化,以将它们全部映射到相同固定轴上的雷达图上。 pandas/sklearn 中的什么变换/缩放函数最适合按变量和年份分组并适当缩放?提前致谢!

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)