从数据框中提取共现数据

问题描述

我有这样的事情：

 fromJobtitle         toJobtitle         size
0              CEO                CEO    65
1              CEO     Vice President    23
2              CEO           Employee    56
3   Vice President                CEO   112
4         Employee                CEO    20

我想计算共现的次数，以便它结合两次出现（仅显示 2 之间有多少元素）

输出示例：

0              CEO     Vice President   135
1              CEO           Employee    76
2              CEO                CEO    65

解决方法

Error: Output stream closed
    at Timeout._onTimeout (C:\Users\salos\OneDrive\Desktop\Speech-Recognition-Bot\node_modules\fluent-ffmpeg\lib\processor.js:491:25)
    at listOnTimeout (node:internal/timers:557:17)
    at processTimers (node:internal/timers:500:7)
Emitted 'error' event on FfmpegCommand instance at:
    at emitEnd (C:\Users\salos\OneDrive\Desktop\Speech-Recognition-Bot\node_modules\fluent-ffmpeg\lib\processor.js:424:16)
    at Timeout._onTimeout (C:\Users\salos\OneDrive\Desktop\Speech-Recognition-Bot\node_modules\fluent-ffmpeg\lib\processor.js:491:17)
    at listOnTimeout (node:internal/timers:557:17)
    at processTimers (node:internal/timers:500:7)

import pandas as pd
df = pd.DataFrame({
    'fromJobtitle': ['CEO','CEO','Vice President','Employee'],'toJobtitle': ['CEO','Employee','CEO'],'size': [65,23,56,112,20]
    })

然后：

df['combination'] = df.apply(lambda row: tuple(sorted([
                                                       row['fromJobtitle'],row['toJobtitle']
                                                      ])),axis=1)

结果：

df = df.groupby('combination').sum().reset_index()

最后：

    combination             size
0   (CEO,CEO)              65
1   (CEO,Employee)         76
2   (CEO,Vice President)   135

结果：

df['from'] = df.apply(lambda row: row['combination'][0],axis=1)
df['to'] = df.apply(lambda row: row['combination'][1],axis=1)
df = df.drop('combination',axis=1)
df.head()

试试：

func application(_ application: UIApplication,didRegisterForRemoteNotificationsWithDeviceToken deviceToken: Data) {
    Messaging.messaging().apnsToken = deviceToken
}

结果如下：

df.groupby(lambda x: tuple(sorted(df.loc[x,['fromJobTitle','toJobTitle']]))).sum()

这是一个不同的解决方案：

首先创建一个按字母顺序组合名称的列

"a" -> true
"aa" -> true
"abb" -> true
"aaa" -> true
"abba" -> true

然后按该名称分组并求和

df['titles'] = np.where(df['fromJobtitle']<df['toJobtitle'],df['fromJobtitle']+"|"+df['toJobtitle'],df['toJobtitle']+"|"+df['fromJobtitle'])

0               CEO|CEO
1    CEO|Vice President
2          CEO|Employee
3    CEO|Vice President
4          CEO|Employee
Name: titles,dtype: object

然后只需将组合名称拆分为单独的部分

df_groups = df.groupby('titles').sum().reset_index()
df_groups

titles              size
CEO|CEO             65
CEO|Employee        76
CEO|Vice President  135

dataframe find-occurrences pandas python