PapersWithCode API - 检索所有区域-任务-子任务分类法

问题描述

我正在寻找 PapersWithCode 的完整分类：areas-tasks-subtasks。 PaperswithCode 网站：https://paperswithcode.com/ PaperswithCode API：https://paperswithcode.com/api/v1/docs/

我已经尝试使用 PapersWithCode-API 这是我希望构建区域-任务-子任务映射的 Python 示例。

import request
area_id = 'computer-vision'
q = f'https://paperswithcode.com/api/v1/areas/{area_id}/tasks/?page=1&items_per_page=500'
res = requests.get(q).json()

输出：

[{'id': 'aesthetics-quality-assessment','name': 'Aesthetics Quality Assessment','description': 'Automatic assessment of aesthetic-related subjective ratings.'},{'id': 'user-constrained-thumbnail-generation','name': 'User Constrained Thumbnail Generation','description': 'Thumbnail generation is the task of generating image thumbnails from an input image.\r\n\r\n<span style="color:grey; opacity: 0.6">( Image credit: [User Constrained Thumbnail Generation using Adaptive Convolutions](https://arxiv.org/pdf/1810.13054v3.pdf) )</span>'},{'id': 'sensor-fusion','name': 'Sensor Fusion','description': '**Sensor Fusion** is the broad category of combining varIoUs on-board sensors to produce better measurement estimates. These sensors are combined to compliment each other and overcome individual shortcomings.\r\n\r\n\r\n<span class="description-source">Source: [Real Time Dense Depth Estimation by Fusing Stereo with Sparse Depth Measurements ](https://arxiv.org/abs/1809.07677)&lt;/span&gt;'},{'id': 'lip-sync-1','name': 'Constrained Lip-synchronization','description': 'This task deals with lip-syncing a video (or) an image to the desired target speech. Approaches in this task work only for a specific (limited set) of identities,languages,speech/voice. See also: Unconstrained lip-synchronization - https://paperswithcode.com/task/lip-sync'},{'id': 'online-multi-object-tracking','name': 'Online Multi-Object Tracking','description': 'The goal of **Online Multi-Object Tracking** is to estimate the spatio-temporal trajectories of multiple objects in an online video stream (i.e.,the video is provided frame-by-frame),which is a fundamental problem for numerous real-time applications,such as video surveillance,autonomous driving,and robot navigation.\r\n\r\n\r\n<span class="description-source">Source: [A Hybrid Data Association Framework for Robust Online Multi-Object Tracking ](https://arxiv.org/abs/1703.10764)&lt;/span&gt;'},{'id': 'cross-domain-few-shot','name': 'Cross-Domain Few-Shot','description': ''},...]

我检查了整个响应，没有关于每个任务是否有任何父任务或子任务的信息。

解决方法

我正朝着类似的方向努力。他们似乎还没有从客户端 API 或其数据存储库中提供任务子任务分类法：https://github.com/paperswithcode/paperswithcode-data

也许您可以尝试发送电子邮件以进行协作或使用爬虫。

正如@JYL 所指出的，可以在 https://github.com/paperswithcode/paperswithcode-data 找到要使用的主要资源。

从那里可以从“评估表”中检索任务-子任务信息。

我能够使用以下 python 代码重建树：

加载数据

### retrieving all tasks hierarchy
import pandas as pd
import json
import gzip

with gzip.open('data/evaluation-tables.json.gz','r') as fin:
    eval_tables = json.loads(fin.read().decode('utf-8'))

探索 json

def expand_tasks_tree(subtasks,parent,root,level):
    global index_dict
    r_tmp = []
    for subtask in subtasks:
        task =  subtask['task']
        if not task in index_dict.keys(): 
            index_dict[task] = max(index_dict.values())+1
        r_tmp += [{'level':level,'root':root,'parent':parent,'task':task,'id':index_dict[task],'parent_id':index_dict[parent]}]
        try:    r_tmp += expand_tasks_tree(subtask['subtasks'],task,level+1)
        except: print(subtask)
    return r_tmp
    
    
index_dict = {'root':0}
eval_all = [{'task':item['categories'][0] if item['categories'] else 'uncategorized','subtasks':[item]} for item in eval_tables]
res = expand_tasks_tree(eval_all,'root','',0)

然后将其解析为遵循父子模式的数据帧：

pd.DataFrame.from_records(res).drop_duplicates(['level','parent','task','id','parent_id'])

这会产生以下数据框：

enter image description here

taxonomy