如何获取DAG的Spark Sql查询执行计划?

问题描述

我正在对Spark SQL查询执行计划进行一些分析。 explain() api打印的执行计划可读性不强。如果我们看到spark Web UI,则会创建DAG图,该图分为作业,阶段和任务,并且可读性更高。是否可以通过代码中的执行计划或任何 api 创建该图?如果不是,是否有任何可以从UI读取该抓斗的api

解决方法

正如我所看到的,这个项目(https://github.com/AbsaOSS/spline-spark-agent)能够解释执行计划并以可读的方式生成它。 这个Spark作业正在读取文件,将其转换为CSV文件,然后写入本地。

JSON输出示例

{
    "id": "3861a1a7-ca31-4fab-b0f5-6dbcb53387ca","operations": {
        "write": {
            "outputSource": "file:/output.csv","append": false,"id": 0,"childIds": [
                1
            ],"params": {
                "path": "output.csv"
            },"extra": {
                "name": "InsertIntoHadoopFsRelationCommand","destinationType": "csv"
            }
        },"reads": [
            {
                "inputSources": [
                    "file:/Users/liajiang/Downloads/spark-onboarding-demo-application/src/main/resources/wikidata.csv"
                ],"id": 2,"schema": [
                    "6742cfd4-d8b6-4827-89f2-4b2f7e060c57","62c022d9-c506-4e6e-984a-ee0c48f9df11","26f1d7b5-74a4-459c-87f3-46a3df781400","6e4063cf-4fd0-465d-a0ee-0e5c53bd52b0","2e019926-3adf-4ece-8ea7-0e01befd296b"
                ],"params": {
                    "inferschema": "true","header": "true"
                },"extra": {
                    "name": "LogicalRelation","sourceType": "csv"
                }
            }
        ],"other": [
            {
                "id": 1,"childIds": [
                    2
                ],"params": {
                    "name": "`source`"
                },"extra": {
                    "name": "SubqueryAlias"
                }
            }
        ]
    },"systemInfo": {
        "name": "spark","version": "2.4.2"
    },"agentInfo": {
        "name": "spline","version": "0.5.5"
    },"extraInfo": {
        "appName": "spark-spline-demo-application","dataTypes": [
            {
                "_typeHint": "dt.Simple","id": "f0dede5e-8fe1-4c22-ab24-98f7f44a9a5a","name": "timestamp","nullable": true
            },{
                "_typeHint": "dt.Simple","id": "dbe1d206-3d87-442c-837d-dfa47c88b9c1","name": "string","id": "0d786d1e-030b-4997-b005-b4603aa247d7","name": "integer","nullable": true
            }
        ],"attributes": [
            {
                "id": "6742cfd4-d8b6-4827-89f2-4b2f7e060c57","name": "date","dataTypeId": "f0dede5e-8fe1-4c22-ab24-98f7f44a9a5a"
            },{
                "id": "62c022d9-c506-4e6e-984a-ee0c48f9df11","name": "domain_code","dataTypeId": "dbe1d206-3d87-442c-837d-dfa47c88b9c1"
            },{
                "id": "26f1d7b5-74a4-459c-87f3-46a3df781400","name": "page_title",{
                "id": "6e4063cf-4fd0-465d-a0ee-0e5c53bd52b0","name": "count_views","dataTypeId": "0d786d1e-030b-4997-b005-b4603aa247d7"
            },{
                "id": "2e019926-3adf-4ece-8ea7-0e01befd296b","name": "total_response_size","dataTypeId": "0d786d1e-030b-4997-b005-b4603aa247d7"
            }
        ]
    }
}


相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...