一键式列到字符串列表的新列

问题描述

假设我有这个数据框:

!=

我希望添加新列“类”以总结索引中每个项目所在的类,作为字符串列表:

import upperFirst from 'lodash/upperFirst'
import camelCase from 'lodash/camelCase'

// registry for components
const components = {}

// Create component
const makeComponent => (path) => React.lazy(() => import(`${path}`));


// Get paths (Webpack)
const requireComponent = require.context(
  '../../../components',// components folder
  true,// look subfolders
  /\w+\.([jt]sx)$/ //regex for files
)

requireComponent.keys().forEach(filePath => {
  // Get component config
  const Component = requireComponent(filePath);

  // Get PascalCase name of component
  const componentName = upperFirst(
    camelCase(
    // Gets the file name regardless of folder depth
     filePath
      .split('/')
      .pop()
      .replace(/\.\w+$/,'')
   );

  components[componentName] = Component;
)

我如何以健壮(NaN和多类交易)和高效方式做到这一点?

目前,我将每个列值转换为bool类型,并在apply函数中乘以其列名,但是:它不能很好地处理多类或NaN,而且显然不是最佳选择。

感谢您的帮助!

解决方法

您可以使用numpy.where获取出现1的位置的索引。从那里开始,您的列索引代表标签,行索引用于对齐。这段代码对我有用:

# Allocate our output first to fill nans into rows who have no labels
out = pd.Series(np.nan,index=df.index,dtype=object)

for i,j in zip(*np.where(df)):
    i = df.index[i]             # Extract dataframe index label instead of integer position
    label = df.columns[j]       # Extract relevant class label

    if pd.isnull(out[i]):       # If the current value in `out` is null,make a list with the class label
        out[i] = [label]
    else:
        out[i].append(label)    # If there is already a label in the out[i] cell,append to it

df["class"] = out

print(df)
   a  b  c  d   class
i                    
0  1  0  0  0     [a]
2  0  0  0  0     NaN
4  0  1  1  0  [b,c]