问题描述
我有一个AVRO架构,目前位于单个avsc文件中,如下所示。现在,我想将地址记录移到另一个常见的avsc文件中,该文件应从许多其他avsc文件中引用。因此,客户和地址将是单独的avsc文件。如何分隔它们并获得客户avsc文件参考地址avsc文件。还有如何使用python处理两个文件。我目前在python3中使用快速avro处理单个avsc文件,但是在python3或pyspark中可以使用任何其他实用程序。
文件名-customer_details.avsc
[
{
"type": "record","namespace": "com.company.model","name": "AddressRecord","fields": [
{
"name": "streetaddress","type": "string"
},{
"name": "city",{
"name": "state",{
"name": "zip","type": "string"
}
]
},{
"namespace": "com.company.model","type": "record","name": "Customer","fields": [
{
"name": "firstname",{
"name": "lastname",{
"name": "email",{
"name": "phone",{
"name": "address","type": {
"type": "array","items": "com.company.model.AddressRecord"
}
}
]
}
]
import fastavro
s1 = fastavro.schema.load_schema('customer_details.avsc')
如何将架构拆分到其他文件中,从而可以从其他avsc文件引用地址记录文件。那么我将如何使用快速Avro(Python)或任何其他python实用工具处理多个avsc文件?
解决方法
为此,AddressRecord
的架构应位于名为com.company.model.AddressRecord.avsc
的文件中,其内容如下:
{
"type": "record","namespace": "com.company.model","name": "AddressRecord","fields": [
{
"name": "streetaddress","type": "string"
},{
"name": "city",{
"name": "state",{
"name": "zip","type": "string"
}
]
}
Customer
模式不一定是特殊的命名约定,因为它是顶层模式,但是遵循相同的约定可能是个好主意。因此它将位于名为com.company.model.Customer.avsc
的文件中,其内容如下:
{
"namespace": "com.company.model","type": "record","name": "Customer","fields": [
{
"name": "firstname",{
"name": "lastname",{
"name": "email",{
"name": "phone",{
"name": "address","type": {
"type": "array","items": "com.company.model.AddressRecord"
}
}
]
}
文件必须在同一目录中。
那么您应该可以进行fastavro.schema.load_schema('com.company.model.Customer.avsc')