问题描述
我在代工厂有两个数据集: df1 & df2,df1 具有带架构的数据。
df2 是没有应用架构的空数据帧。
使用数据代理,我能够从 df1 中提取架构
{
"foundrySchema": {
"fieldSchemaList": [
{...
}
],"primaryKey": null,"dataFrameReaderClass": "n/a","customMetadata": {}
},"rows": []
}
如何通过休息调用将此架构应用于空数据帧 df2 ?
下面的代工厂示例展示了如何提交一个空事务, 此示例未显示如何应用架构
curl -X POST \
-H "Authorization: Bearer ${TOKEN}" \
-H "Content-Type: application/json" \
-d '{}' \
"${CATALOG_URL}/api/catalog/datasets/${DATASET_RID}/transactions/${TRANSACTION_RID}/commit"
解决方法
这是一个 Python 函数,用于为具有已提交事务的数据集上传架构:
from urllib.parse import quote_plus
import requests
def upload_dataset_schema(dataset_rid: str,transaction_rid: str,schema: dict,token: str,branch='master'):
"""
Uploads the foundry dataset schema for a dataset,transaction,branch combination
Args:
dataset_rid: The rid of the dataset
transaction_rid: The rid of the transaction
schema: The foundry schema
branch: The branch
Returns: None
"""
base_url = "https://foundry-instance/foundry-metadata/api"
response = requests.post(f"{base_url}/schemas/datasets/"
f"{dataset_rid}/branches/{quote_plus(branch)}",params={'endTransactionRid': transaction_rid},json=schema,headers={
'content-type': "application/json",'authorization': f"Bearer {token}",}
)
response.raise_for_status()