Jolt 分组和创建数组

问题描述

希望你们一切顺利。

我是震动世界的新手，今天才开始研究。我在将我的 json 结果转换为我应该发送的格式时遇到了很多困难。

这是我得到的一个例子：

[
  {
    "un": "RBA335","uf": "ES","city": "Cariacica","d0": 1,"day": "Mon","dzero": 1,"Active": 1
  },{
    "un": "RBA335","day": "Tue","day": "Wed","day": "Thu","day": "Fri","d0": 0,"day": "Sat","dzero": 0,"Active": 0
  },"day": "Sun","city": "Vitoria","Active": 0
  }
]

我需要根据“un”、“uf”、“city”和“d0”的组合来分组一个键，并按它们分组创建一个名为“windows”的数组，其余字段（“day ”、“dzero”、“活动”）。如上所述，我的预期结果如下：

[
  {
    "un": "RBA335","windows": [
      {
        "day": "Mon","active": 1
      },{
        "day": "Tue",{
        "day": "Wed",{
        "day": "Thu",{
        "day": "Fri",{
        "day": "Sat","active": 0
      },{
        "day": "Sun","active": 0
      }
    ]
  },"city": "Vitória","active": 0
      }
    ]
  }
]

如果可以的话，那真的会很有帮助。

我先谢谢你！

解决方法

试试下面使用这种方法的规范：

创建一个复合键并将其添加到每个元素
将具有相同复合键的所有元素移动到一个键/值对中，其中键是复合键，值是具有该复合键值的元素数组。
对于每个数组中的第一个元素，从每个元素中提升公共字段（复合键字段除外）。对于数组中的每个元素，将 day、dzero 和 active 字段下推到“windows”数组中。
将每个复合键对象的每个元素放入一个通用的顶级数组中。

#supuestamente arreglado
import string
from chatterbot import languages
import spacy


class PosLemmaTagger(object):

    def __init__(self,language=None):
        self.language = language or languages.ENG

        self.punctuation_table = str.maketrans(dict.fromkeys(string.punctuation))

        if self.language.ISO_639_1.lower() == 'en':
            self.nlp = spacy.load('en_core_web_sm')
        else:
            self.nlp = spacy.load(self.language.ISO_639_1.lower())
    
        if self.language.ISO_639_1.lower() == 'es':
            self.nlp = spacy.load('es_core_news_sm')
        else:
            self.nlp = spacy.load(self.language.ISO_639_1.lower())

        if self.language.ISO_639_1.lower() == 'xx':
            self.nlp = spacy.load('xx_ent_wiki_sm')
        else:
            self.nlp = spacy.load(self.language.ISO_639_1.lower())

    def get_bigram_pair_string(self,text):
        """
        Return a string of text containing part-of-speech,lemma pairs.
        """
        bigram_pairs = []

        if len(text) <= 2:
            text_without_punctuation = text.translate(self.punctuation_table)
            if len(text_without_punctuation) >= 1:
                text = text_without_punctuation

        document = self.nlp(text)

        if len(text) <= 2:
            bigram_pairs = [
                token.lemma_.lower() for token in document
            ]
        else:
            tokens = [
                token for token in document if token.is_alpha and not token.is_stop
            ]

            if len(tokens) < 2:
                tokens = [
                    token for token in document if token.is_alpha
                ]

            for index in range(1,len(tokens)):
                bigram_pairs.append('{}:{}'.format(
                    tokens[index - 1].pos_,tokens[index].lemma_.lower()
                ))

        if not bigram_pairs:
            bigram_pairs = [
                token.lemma_.lower() for token in document
            ]

        return ' '.join(bigram_pairs)

apache-nifi arrays arrays arrays jolt json