如何从我的电脑中的自定义 Python 中提取 Palantir-Foundry 数据

问题描述

我们正在尝试读取大文件。但是我们无法从 tableau live connection 或 power BI 中提取巨大的 palantir-foundry 文件数据。所以我们试图从python连接到Palantir。任何人都可以提出任何其他方法来从 palantir 中提取大文件。或者如何从我本地系统的自定义 python 连接到 palantir。

我试图在互联网上找到一些参考资料，但我总是以 pyspark 风格的 palantir 编码结束。我在下面找到了用于提取 palantir 数据帧的 python 代码。但为此我也面临一些问题，比如错误代码 400。然后“最大重试次数超过 url:/foundry-data”。我们的 palantir 基本网址就像 https://XXXX.palantirfoundry.com/。当我提供我们公司的这个基本网址时，我收到了 405 错误。有人可以帮忙吗。

import requests
import pandas as pd

def query_foundry_sql(query,token,branch='master',base_url='https://foundry-instance.com') -> (list,list):
"""
Queries the dataproxy query API with spark sql.
Example: query_foundry_sql("SELECT * FROM `/path/to/dataset` Limit 5000","ey...")
Args:
    query: the sql query
    branch: the branch of the dataset / query

Returns: (columns,data) tuple. data contains the data matrix,columns the list of columns
Can be converted to a pandas Dataframe:
pd.DataFrame(data,columns)

"""
    response = requests.post(f"{base_url}/foundry-data-proxy/api/dataproxy/queryWithFallbacks",headers={'Authorization': f'Bearer {token}'},params={'fallbackBranchIds': [branch]},json={'query': query})

    response.raise_for_status()
    json = response.json()
    columns = [e['name'] for e in json['foundrySchema']['fieldSchemaList']]
    return columns,json['rows']

columns,data = query_foundry_sql("SELECT * FROM `/Global/Foundry 
Operations/Foundry Support/iris` Limit 5000","ey...",base_url="https://foundry-instance.com")
df = pd.DataFrame(data=data,columns=columns)
df.head(5)

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

palantir-foundry pyspark pyspark python python-requests