如何创建包含连续值和分类值且具有均匀随机分布的特定大小的数据框

问题描述

因此,我正在尝试生成给定尺寸的一些伪随机数据。本质上,我想要一个数据帧,其中的数据具有均匀的随机分布。数据包含连续值和分类值。我已经编写了以下代码,但是它并不能达到我想要的方式。

import random
import pandas as pd
import time
from datetime import datetime

# declare global variables
adv_name = ['soft toys','kitchenware','electronics','mobile phones','laptops']
adv_loc = ['location_1','location_2','location_3','location_4','location_5']
adv_prod = ['baby product','laptops']
adv_size = [1,2,3,4,10]
adv_layout = ['static','dynamic']  # advertisment layout type on website

# adv_date,start_time,end_time = []
num = 10 # the given dimension

# define function to generate random advert locations
def rand_shuf_loc(str_lst,num):
    lst = adv_loc
    # using list comprehension
    rand_shuf_str = [item for item in lst for i in range(num)]
    return(rand_shuf_str)
    

# define function to generate random advert names
def rand_shuf_prod(loc_list,num):
    rand_shuf_str = [item for item in loc_list for i in range(num)]
    random.shuffle(rand_shuf_str)
    return(rand_shuf_str)

# define function to generate random impression and click data
def rand_clic_impr(num):
    rand_impr_lst = []
    click_lst = []
    for i in range(num):
        rand_impr_lst.append(random.randint(0,100))
        click_lst.append(random.randint(0,100))
    return {'rand_impr_lst': rand_impr_lst,'rand_click_lst': click_lst}

# define function to generate random product price and discount
def rand_prod_price_discount(num):
    prod_price_lst = []  # advertised product price
    prod_discnt_lst = []  # advertised product discount
    
    for i in range(num):
        prod_price_lst.append(random.randint(10,100))
        prod_discnt_lst.append(random.randint(10,100))
    
    return {'prod_price_lst': prod_price_lst,'prod_discnt_lst': prod_discnt_lst}

def rand_prod_click_timestamp(stime,etime,num):
    prod_clik_tmstmp = []
    frmt = '%d-%m-%Y %H:%M:%s'
        
    for i in range(num):
        rtime = int(random.random()*86400)
    
        hours   = int(rtime/3600)
        minutes = int((rtime - hours*3600)/60)
        seconds = rtime - hours*3600 - minutes*60
    
        time_string = '%02d:%02d:%02d' % (hours,minutes,seconds)
        prod_clik_tmstmp.append(time_string)
        time_stmp = [item for item in prod_clik_tmstmp for i in range(num)]
        
    return {'prod_clik_tmstmp_lst':time_stmp}

def main():
    print('generating data...')
    # print('generating random geographic coordinates...')
    # get the impressions and click data
    impression = rand_clic_impr(num)
    clicks = rand_clic_impr(num)
    product_price = rand_prod_price_discount(num)
    product_discount = rand_prod_price_discount(num)
    prod_clik_tmstmp = rand_prod_click_timestamp("20-01-2018 13:30:00","23-01-2018 04:50:34",num)
    lst_dict = {"ad_loc": rand_shuf_loc(adv_loc,num),"prod": rand_shuf_prod(adv_prod,"imprsn": impression['rand_impr_lst'],"cliks": clicks['rand_click_lst'],"prod_price": product_price['prod_price_lst'],"prod_discnt": product_discount['prod_discnt_lst'],"prod_clik_stmp": prod_clik_tmstmp['prod_clik_tmstmp_lst']}
    fake_data = pd.DataFrame.from_dict(lst_dict,orient="index")
    res = fake_data.apply(lambda x: x.fillna(0)
                          if x.dtype.kind in 'biufc'
                          # where 'biufc' means boolean,integer,# unicode,float & complex data types
                          else x.fillna(random.randint(0,100)
                                        )
                          )
    print(res.transpose())
    res.to_csv("fake_data.csv",sep=",")

# invoke the main function
   
if __name__ == "__main__":
    main()

问题1

当我执行上面的代码片段时,它可以正常打印,但是以csv格式写入时,其水平放置;即看起来像这样。

wrong-data

。写入csv文件时如何垂直放置?我要的是7列(请参见上面的lst_dict变量),行数为 n

问题2 我不明白为什么会为前50列生成随机日期,而其余的列都填充有数值?

解决方法

要回答第一个问题,请替换

print(res.transpose())

使用

res.transpose() print(res)  

要回答第二个问题,请查看方法输出的长度

rand_shuf_loc() 

它以及其他帮助器功能仅产生50个项目的列表。
使用方法创建res

fake_data.apply  

将所有nan替换为随机数字,因此还将数字应用于没有任何预定义值的列。

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...