问题描述
假设我有这个数据框:
!=
我希望添加新列“类”以总结索引中每个项目所在的类,作为字符串列表:
import upperFirst from 'lodash/upperFirst'
import camelCase from 'lodash/camelCase'
// registry for components
const components = {}
// Create component
const makeComponent => (path) => React.lazy(() => import(`${path}`));
// Get paths (Webpack)
const requireComponent = require.context(
'../../../components',// components folder
true,// look subfolders
/\w+\.([jt]sx)$/ //regex for files
)
requireComponent.keys().forEach(filePath => {
// Get component config
const Component = requireComponent(filePath);
// Get PascalCase name of component
const componentName = upperFirst(
camelCase(
// Gets the file name regardless of folder depth
filePath
.split('/')
.pop()
.replace(/\.\w+$/,'')
);
components[componentName] = Component;
)
我如何以健壮(NaN和多类交易)和高效方式做到这一点?
目前,我将每个列值转换为bool类型,并在apply函数中乘以其列名,但是:它不能很好地处理多类或NaN,而且显然不是最佳选择。
感谢您的帮助!
解决方法
您可以使用numpy.where
获取出现1的位置的索引。从那里开始,您的列索引代表标签,行索引用于对齐。这段代码对我有用:
# Allocate our output first to fill nans into rows who have no labels
out = pd.Series(np.nan,index=df.index,dtype=object)
for i,j in zip(*np.where(df)):
i = df.index[i] # Extract dataframe index label instead of integer position
label = df.columns[j] # Extract relevant class label
if pd.isnull(out[i]): # If the current value in `out` is null,make a list with the class label
out[i] = [label]
else:
out[i].append(label) # If there is already a label in the out[i] cell,append to it
df["class"] = out
print(df)
a b c d class
i
0 1 0 0 0 [a]
2 0 0 0 0 NaN
4 0 1 1 0 [b,c]