问题描述
因此,我正在尝试生成给定尺寸的一些伪随机数据。本质上,我想要一个数据帧,其中的数据具有均匀的随机分布。数据包含连续值和分类值。我已经编写了以下代码,但是它并不能达到我想要的方式。
import random
import pandas as pd
import time
from datetime import datetime
# declare global variables
adv_name = ['soft toys','kitchenware','electronics','mobile phones','laptops']
adv_loc = ['location_1','location_2','location_3','location_4','location_5']
adv_prod = ['baby product','laptops']
adv_size = [1,2,3,4,10]
adv_layout = ['static','dynamic'] # advertisment layout type on website
# adv_date,start_time,end_time = []
num = 10 # the given dimension
# define function to generate random advert locations
def rand_shuf_loc(str_lst,num):
lst = adv_loc
# using list comprehension
rand_shuf_str = [item for item in lst for i in range(num)]
return(rand_shuf_str)
# define function to generate random advert names
def rand_shuf_prod(loc_list,num):
rand_shuf_str = [item for item in loc_list for i in range(num)]
random.shuffle(rand_shuf_str)
return(rand_shuf_str)
# define function to generate random impression and click data
def rand_clic_impr(num):
rand_impr_lst = []
click_lst = []
for i in range(num):
rand_impr_lst.append(random.randint(0,100))
click_lst.append(random.randint(0,100))
return {'rand_impr_lst': rand_impr_lst,'rand_click_lst': click_lst}
# define function to generate random product price and discount
def rand_prod_price_discount(num):
prod_price_lst = [] # advertised product price
prod_discnt_lst = [] # advertised product discount
for i in range(num):
prod_price_lst.append(random.randint(10,100))
prod_discnt_lst.append(random.randint(10,100))
return {'prod_price_lst': prod_price_lst,'prod_discnt_lst': prod_discnt_lst}
def rand_prod_click_timestamp(stime,etime,num):
prod_clik_tmstmp = []
frmt = '%d-%m-%Y %H:%M:%s'
for i in range(num):
rtime = int(random.random()*86400)
hours = int(rtime/3600)
minutes = int((rtime - hours*3600)/60)
seconds = rtime - hours*3600 - minutes*60
time_string = '%02d:%02d:%02d' % (hours,minutes,seconds)
prod_clik_tmstmp.append(time_string)
time_stmp = [item for item in prod_clik_tmstmp for i in range(num)]
return {'prod_clik_tmstmp_lst':time_stmp}
def main():
print('generating data...')
# print('generating random geographic coordinates...')
# get the impressions and click data
impression = rand_clic_impr(num)
clicks = rand_clic_impr(num)
product_price = rand_prod_price_discount(num)
product_discount = rand_prod_price_discount(num)
prod_clik_tmstmp = rand_prod_click_timestamp("20-01-2018 13:30:00","23-01-2018 04:50:34",num)
lst_dict = {"ad_loc": rand_shuf_loc(adv_loc,num),"prod": rand_shuf_prod(adv_prod,"imprsn": impression['rand_impr_lst'],"cliks": clicks['rand_click_lst'],"prod_price": product_price['prod_price_lst'],"prod_discnt": product_discount['prod_discnt_lst'],"prod_clik_stmp": prod_clik_tmstmp['prod_clik_tmstmp_lst']}
fake_data = pd.DataFrame.from_dict(lst_dict,orient="index")
res = fake_data.apply(lambda x: x.fillna(0)
if x.dtype.kind in 'biufc'
# where 'biufc' means boolean,integer,# unicode,float & complex data types
else x.fillna(random.randint(0,100)
)
)
print(res.transpose())
res.to_csv("fake_data.csv",sep=",")
# invoke the main function
if __name__ == "__main__":
main()
问题1
当我执行上面的代码片段时,它可以正常打印,但是以csv格式写入时,其水平放置;即看起来像这样。。写入csv文件时如何垂直放置?我要的是7列(请参见上面的lst_dict变量),行数为 n ?
?问题2 我不明白为什么会为前50列生成随机日期,而其余的列都填充有数值?
解决方法
要回答第一个问题,请替换
print(res.transpose())
使用
res.transpose() print(res)
要回答第二个问题,请查看方法输出的长度
rand_shuf_loc()
它以及其他帮助器功能仅产生50个项目的列表。
使用方法创建res
fake_data.apply
将所有nan替换为随机数字,因此还将数字应用于没有任何预定义值的列。