python jsonschema在列表项上使用模式不起作用

问题描述

我想使用 json 模式来验证如下内容

{
  "function_mapper": {
    "critical": [
      "DataContentValidation.get_multiple_types_columns","DataContentValidation.get_invalid_session_ids"
    ],"warning": [
      "DataContentValidation.get_columns_if_most_data_null","FeatureContentValidation.detect_inconsistencies"
    ]
  }
}

而且我想使用正则表达式来检查列表内容是否像这样 Class.function 我已经尝试将函数更改为 'dataContentValidation.get_multiple_types_columns' 我想出了这个模式,但没有用:

{
    "type": "object","properties": {
        "function_mapper": {
            "type": "object","properties": {
                "critical": {
                    "type": "array","uniqueItems": True,"items":
                        {
                            "type": "string","pattern": r"[A-Z]\w+\.\w+"
                            # Todo add pattern that represent a class and function i.e: Class.function
                        }
                },"error": {
                    "type": "array","items": [
                        {
                            "type": "string","pattern": r"[A-Z]\w+\.\w+"
                            # Todo add pattern that represent a class and function i.e: Class.function
                        }
                    ]
                },"informative": {
                    "type": "array","pattern": r"[A-Z]\w+\.\w+"
                            # Todo add pattern that represent a class and function i.e: Class.function
                        }
                    ]
                }
            }
        },"days_gap": {"type": "integer","minimum": 0},"timestamp_column": {"type": "string"},"missing_data_median_epsilon": {"type": "number","minimum": 0,"maximum": 1},"group_by_time_unit": {"type": "string","enum": ["d","w","m","h","T","min","s"]},"null_data_percentage": {"type": "number","common_feature_threshold": {"type": "number","columns_to_count": {"type": "array","items": {"type": "string"}},"cluster_median_epsilon": {"type": "number","app_session_id_column": {"type": "string"}
    }
}

我也尝试用项目替换包含,但它仍然不起作用。 我做错了什么?

解决方法

我发现您的架构有两个问题,这意味着它不是有效的 JSON(除了注释)。

由于某种原因,您在正则表达式开引号之前有一个 r。这使得 JSON 无效。

您需要在 JSON 中转义斜杠。将您的架构粘贴到 JSON 感知编辑器中会突出显示此错误。

JSON 中的字符串需要一些转义...

Backspace is replaced with \b
Form feed is replaced with \f
Newline is replaced with \n
Carriage return is replaced with \r
Tab is replaced with \t
Double quote is replaced with \"
Backslash is replaced with \\

根据https://www.json.org

如果没有进一步提示出了什么问题,我不确定我能提供更多帮助。

,

我假设您正在使用您粘贴的内容作为 python dict,但@Relequestual 的评论让我意识到这可能只是一个 JSON 问题。

这是我在 Python 中所做的最小示例,这有帮助吗?

import jsonschema
from pprint import pprint

schema = {
    "type": "object","properties": {
        "array_of_strings": {
            "type": "array","items": {
                "type": "string","pattern": r"\w\d",# a letter and a number
            }
        }
    },"additionalProperties": False,}

validator = jsonschema.Draft4Validator(schema)

def check(obj):
    pprint(obj)
    result = "VALID" if validator.is_valid(obj) else "INVALID"
    print(f"=> {result}")

然后通过一些测试用例,它按预期通过和失败:

>>> check({"array_of_strings": []})
{'array_of_strings': []}
=> VALID

>>> check({"array_of_strings": [""]})
{'array_of_strings': ['']}
=> INVALID

>>> check({"array_of_strings": ["A4"]})
{'array_of_strings': ['A4']}
=> VALID

>>> check({"array_of_strings": ["A4","4A"]})
{'array_of_strings': ['A4','4A']}
=> INVALID

>>> check({"array_of_strings": ["A4"],"other_key": "123"})
{'array_of_strings': ['A4'],'other_key': '123'}
=> INVALID

>>> check({})
{}
=> VALID

>>> check({"other_key": []})
{'other_key': []}
=> INVALID

>>> check({"array_of_strings": {}})
{'array_of_strings': {}}
=> INVALID