pandas(一):选取部分行、列写入到另一个文件

一、选取列

import pandas as pd
df = pd.read_csv('zhihutest.csv', sep="\t")
# 类别特征(16)
fixlen_category_columns = ['m_sex', 'm_access_frequencies', 'm_twoA', 'm_twoB', 'm_twoC',
                           'm_twoD', 'm_twoe', 'm_categoryA', 'm_categoryB', 'm_categoryC',
                           'm_categoryD', 'm_categoryE', 'm_num_interest_topic', 'num_topic_attention_intersection',
                           'q_num_topic_words',
                           'num_topic_interest_intersection'
                         ]
# 数值特征(7)
fixlen_number_columns = ['m_salt_score', 'm_num_atten_topic', 'q_num_title_chars_words',
                         'q_num_desc_chars_words', 'q_num_desc_words', 'q_num_title_words',
                         'days_to_invite'
                        ]
target = ['label']
text = ["q_title_words"]
#总列数 = 25
#数值列数: 7
#数值+类别 = 23
cols = target + fixlen_number_columns + fixlen_category_columns + text
fout = df[cols]
print(fout)
fout.to_csv("zhihu.txt", mode='a', header=False, index=False,  sep='\t')

二、选取行

import pandas as pd

df = pd.read_csv('criteo_sampled_data.csv', sep=",", nrows =20000)
df = df.sample(frac=1.0)
cut_idx = int(round(0.2 * df.shape[0]))
df_test, df_train = df.iloc[:cut_idx], df.iloc[cut_idx:]
df_test.to_csv("criteo_train.txt", index=False, sep='\t')
df_train.to_csv("criteo_test.txt", index=False, sep='\t')

 

相关文章

转载:一文讲述Pandas库的数据读取、数据获取、数据拼接、数...
Pandas是一个开源的第三方Python库,从Numpy和Matplotlib的基...
整体流程登录天池在线编程环境导入pandas和xrld操作EXCEL文件...
 一、numpy小结             二、pandas2.1为...
1、时间偏移DateOffset对象DateOffset类似于时间差Timedelta...
1、pandas内置样式空值高亮highlight_null最大最小值高亮背景...