用数据框中的新值替换唯一值,熊猫?

问题描述

我有如下所示的数据框,我想通过替换列的唯一值来使其不敏感。即,我想用“ faker”库生成的一些虚假姓氏代替姓氏列。

代码段如下。

import pandas as pd
from faker import Faker
fake = Faker()
print(fake.first_name())
print(fake.last_name())
last = ('Meyer','Maier','Meyer','Mayer','Meyr','Mair')
job = ('data analyst','programmer','computer scientist','data scientist','accountant','psychiatrist')
language = ('Python','Perl','Java','Cobol','Brainfuck')

df = pd.DataFrame(list(zip(last,job,language)),columns =['last','job','language'],index=first) 

我想要的输出是使用假名称更改姓氏列,但是例如,应始终将Meyer替换为相同的假姓氏。

解决方法

获取所有唯一名称,创建具有映射唯一名称->假名称的字典,并将其映射到您的列:

import pandas as pd
first = ('Mike','Dorothee','Tom','Bill','Pete','Kate')
last = ('Meyer','Maier','Meyer','Mayer','Meyr','Mair')
job = ('data analyst','programmer','computer scientist','data scientist','accountant','psychiatrist')
language = ('Python','Perl','Java','Cobol','Brainfuck')

df = pd.DataFrame(list(zip(last,job,language)),columns =['last','job','language'],index=first) 
print(df)

# get all unique names - this can easily hande a couple tenthousand names
all_names = set(df["last"])

# create mapper: you would use fake.last_name() instead of 42+i
# mapper = {k: fake.last_name() for k in all_names }
mapper = {k: 42 + i for i,k in enumerate(all_names )}

# apply it
df["last"] = df["last"].map(mapper)
print(df)

输出:

# before
          last                 job   language
Mike      Meyer        data analyst     Python
Dorothee  Maier          programmer       Perl
Tom       Meyer  computer scientist       Java
Bill      Mayer      data scientist       Java
Pete       Meyr          accountant      Cobol
Kate       Mair        psychiatrist  Brainfuck

# after
          last                 job   language
Mike        44        data analyst     Python
Dorothee    43          programmer       Perl
Tom         44  computer scientist       Java
Bill        45      data scientist       Java
Pete        46          accountant      Cobol
Kate        47        psychiatrist  Brainfuck

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...