按降序计算值列表的频率及其相关的百分比值

问题描述

我正在尝试编写一个Python代码,该代码以降序计算给定值列表(y)的每个y值的频率以及考虑到频率的,具有更大y值的相关采样百分比(yi)。

非常感谢! 这是我使用NumPy编写的Python代码,但是在计算百分比和计算频率时遇到一些错误,我希望它与新的y值数组保持一致,而不重复(arr)

#include <iostream>
using namespace std;

class base{
    int *apex;      
public:
    explicit base(int i = 0){
        cout << "this does executes" << endl;
        *apex = i; // <<<<<--- problem???
        cout << "this doesnt executes" << endl;
    }
};

int main(void){
    base test_object(7);
    cout << "this also doesnt executes";
}


// I know how to avoid this but i want to know what
// exactly the problem is associated with *apex = i;

解决方法

基本

list.count(item)函数返回itemlist中可以找到的次数。 list.index(item)函数返回列表中第一个item的位置,该位置恰好是列表前的元素数(因为python从0开始对列表建立索引)并且按递减顺序排列时尚,这恰好是更高值的数量。

y = [390,390,370,350,330,310,290]

def freq(item,lst):
    return lst.count(item)

def higher_perc(item,lst):
    return lst.index(item) / len(lst)

print(freq(370,y))  # 2
print(higher_perc(370,y))  # 0.2631578947368421

如果我们想将其应用于多个值,我们可以创建一个函数,该函数返回一个应用该操作的函数,然后使用map

y = [390,290]
items = sorted(set(y),reverse=True)

def create_freq_function(lst):
    def freq(item):
        return lst.count(item)
    return freq

def create_higher_perc_function(lst):
    def higher_perc(item):
        return lst.index(item) / len(lst)
    return higher_perc

print(items)
# [390,290]
print(list(map(create_freq_function(y),items))
# [5,2,1,6,4,1]
print(list(map(create_higher_perc_function(y),items))
# [0.0,0.2631578947368421,0.3684210526315789,0.42105263157894735,0.7368421052631579,0.9473684210526315]

脾气暴躁

如果数据集很大,则numpy软件包会有所帮助。 numpy.unique既可以获取唯一项的列表,也可以显示它们出现的次数,而numpy.cumsum可以累加各个元素的百分比。

import numpy as np

y = np.array([390,290])

items,freqs = np.unique(y,return_counts=True)
items,freqs = items[::-1],freqs[::-1]
perc_freqs = freqs/len(y)
higher_percs = np.cumsum(perc_freqs) - perc_freqs

print(items)
# [390 370 350 330 310 290]
print(freqs)
# [5 2 1 6 4 1]
print(higher_percs)
# [0.         0.26315789 0.36842105 0.42105263 0.73684211 0.94736842]
,

您可以使用此功能进行频率计算:

def frequencies(values,display_flag=True):
  freq = {}
  for val in values:
    if str(val) in freq:
      freq[str(val)] += 1
    else:
      freq[str(val)] = 1
  
  # Displaying frequencies
  if display_flag:
    for i in (sorted (freq.keys())) : 
      print("Frequency of " + i + " is : " + str(freq[i]))

  return freq

您可以使用此功能进行百分比计算:

def percentages(values):
  freq = frequencies(values,False)
  total = len(values)
  current = 0

  for i in (sorted (freq.keys())) :
    temp = freq[i]/total
    print("Percentage of " + i + " is : " + str(current + temp))
    current += temp

请注意,percentages函数可以与frequencies函数一起使用

,

有两种方法,使用传统的list或有效的numpy

使用列表

>>> y = [390,290]
#declare a lambda function to calculate percentage and frequency
>>> freq = lambda x: y.count(x)
>>> percent = lambda z: y.index(z)/len(y)
#after this using map() and mapping over only unique values rather than all
>>> print(list(map(freq,set(y))))
[1,5,1]
>>> print(list(map(percent,set(y))))
[0.9473684210526315,0.0,0.3684210526315789]
>>> set(y)
{290,350}
#frequency and percent corresponds here to respective values

使用Numpy

我建议使用此原因,因为它快速有效,但是只有在要处理相对较大的数据集时,您才会看到更好的结果。

>>> import numpy as np
>>> y_new = np.array(y)
>>> arr,count = np.unique(y_new,return_counts=True) #very simple approach to get output
>>> count
array([1,5])
>>> arr
array([290,390])
#defining vectorized percentage function refering to what defined previously
>>> vec_percent = np.vectorize(percent)
>>> np.unique(vec_percent(y_new))
array([0.,0.26315789,0.36842105,0.42105263,0.73684211,0.94736842])
#you get your percentages

现在就让您使用什么。

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...