一键式列到字符串列表的新列

问题描述

假设我有这个数据框：

!=

我希望添加新列“类”以总结索引中每个项目所在的类，作为字符串列表：

import upperFirst from 'lodash/upperFirst'
import camelCase from 'lodash/camelCase'

// registry for components
const components = {}

// Create component
const makeComponent => (path) => React.lazy(() => import(`${path}`));


// Get paths (Webpack)
const requireComponent = require.context(
  '../../../components',// components folder
  true,// look subfolders
  /\w+\.([jt]sx)$/ //regex for files
)

requireComponent.keys().forEach(filePath => {
  // Get component config
  const Component = requireComponent(filePath);

  // Get PascalCase name of component
  const componentName = upperFirst(
    camelCase(
    // Gets the file name regardless of folder depth
     filePath
      .split('/')
      .pop()
      .replace(/\.\w+$/,'')
   );

  components[componentName] = Component;
)

我如何以健壮（NaN和多类交易）和高效方式做到这一点？

目前，我将每个列值转换为bool类型，并在apply函数中乘以其列名，但是：它不能很好地处理多类或NaN，而且显然不是最佳选择。

感谢您的帮助！

解决方法

您可以使用numpy.where获取出现1的位置的索引。从那里开始，您的列索引代表标签，行索引用于对齐。这段代码对我有用：

# Allocate our output first to fill nans into rows who have no labels
out = pd.Series(np.nan,index=df.index,dtype=object)

for i,j in zip(*np.where(df)):
    i = df.index[i]             # Extract dataframe index label instead of integer position
    label = df.columns[j]       # Extract relevant class label

    if pd.isnull(out[i]):       # If the current value in `out` is null,make a list with the class label
        out[i] = [label]
    else:
        out[i].append(label)    # If there is already a label in the out[i] cell,append to it

df["class"] = out

print(df)
   a  b  c  d   class
i                    
0  1  0  0  0     [a]
2  0  0  0  0     NaN
4  0  1  1  0  [b,c]

one-hot-encoding pandas python