解析多个使用Python相互引用的AVROavsc文件fastavro

问题描述

我有一个AVRO架构,目前位于单个avsc文件中,如下所示。现在,我想将地址记录移到另一个常见的avsc文件中,该文件应从许多其他avsc文件中引用。因此,客户和地址将是单独的avsc文件。如何分隔它们并获得客户avsc文件参考地址avsc文件。还有如何使用python处理两个文件。我目前在python3中使用快速avro处理单个avsc文件,但是在python3或pyspark中可以使用任何其他实用程序。

文件名-customer_details.avsc

[
{
    "type": "record","namespace": "com.company.model","name": "AddressRecord","fields": [
        {
            "name": "streetaddress","type": "string"
        },{
            "name": "city",{
            "name": "state",{
            "name": "zip","type": "string"
        }
    ]
},{
    "namespace": "com.company.model","type": "record","name": "Customer","fields": [
        {
            "name": "firstname",{
            "name": "lastname",{
            "name": "email",{
            "name": "phone",{
            "name": "address","type": {
                "type": "array","items": "com.company.model.AddressRecord"
            }
        }
    ]
}
]
import fastavro

s1 = fastavro.schema.load_schema('customer_details.avsc')

如何将架构拆分到其他文件中,从而可以从其他avsc文件引用地址记录文件。那么我将如何使用快速Avro(Python)或任何其他python实用工具处理多个avsc文件?

解决方法

为此,AddressRecord的架构应位于名为com.company.model.AddressRecord.avsc的文件中,其内容如下:

{
    "type": "record","namespace": "com.company.model","name": "AddressRecord","fields": [
        {
            "name": "streetaddress","type": "string"
        },{
            "name": "city",{
            "name": "state",{
            "name": "zip","type": "string"
        }
    ]
}

Customer模式不一定是特殊的命名约定,因为它是顶层模式,但是遵循相同的约定可能是个好主意。因此它将位于名为com.company.model.Customer.avsc的文件中,其内容如下:

{
    "namespace": "com.company.model","type": "record","name": "Customer","fields": [
        {
            "name": "firstname",{
            "name": "lastname",{
            "name": "email",{
            "name": "phone",{
            "name": "address","type": {
                "type": "array","items": "com.company.model.AddressRecord"
            }
        }
    ]
}

文件必须在同一目录中。

那么您应该可以进行fastavro.schema.load_schema('com.company.model.Customer.avsc')

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...