多线程上的 Neo4j OGM 空列表字段

问题描述

我正在尝试将我的 neo4j 数据库索引到 Solr。
为此,我想使用多线程来加速这个漫长的过程。然而,通过这样做,有时至少有一个字段是空的,而如果我们在没有多线程的情况下这样做,这个字段就会充满一个名称列表。

从我们索引的实例到该字段的具体路径如下:

(PhysicalEntity) --[ReferenceEntity]--> (ReferenceEntity.name)

为此,这是我当前的代码

 private int indexBySchemaClass(Class<? extends DatabaSEObject> clazz) throws IndexerException {
    Collection<Long> allOfGivenClass = schemaService.getDbIdsByClass(clazz);

    LinkedBlockingQueue<IndexDocument> allDocuments = new LinkedBlockingQueue<>(500);
    List<Long> missingDocuments = Collections.synchronizedList(new ArrayList<>());
    
    // ...

    allOfGivenClass.parallelStream().forEach( dbId -> {
        try {
            IndexDocument document = this.documentBuilder.createSolrDocument(dbId); // transactional

            if (document != null) {
                allDocuments.put(document);
            } else {
                missingDocuments.add(dbId);
            }

        } catch (InterruptedException e) {
            e.printstacktrace();
            // Really,I'm not expecting any interruptions... so if they are caught,interrupts *this*
            Thread.currentThread().interrupt();
        }
    });
}

这是 DocumentBuilder.createSolrDocuement(dbId)

@Transactional
IndexDocument createSolrDocument(Long dbId) {
    IndexDocument document = new IndexDocument();
    /*
     * Query the Graph and load only Primitives and no Relations attributes.
     * Lazy-loading will load them on demand.
     */
    DatabaSEObject databaSEObject;
    try {
        databaSEObject = databaSEObjectRepository.findByDbId(dbId);
    } catch (MappingException e) {
        logger.error("There has been an error mapping the object with dbId: " + dbId,e);
        return null;
    }

    document.setDbId(databaSEObject.getDbId().toString());
    // ...
    if (databaSEObject instanceof PhysicalEntity) {
        PhysicalEntity physicalEntity = (PhysicalEntity) databaSEObject;
        // ...
        setReferenceEntity(document,physicalEntity);
    } else if (databaSEObject instanceof Event) {
        // ...
    }
    return document;
}

数据库对象存储库:

@Repository
public interface DatabaSEObjectRepository extends GraphRepository<DatabaSEObject>{

    //Derived query
    <T extends DatabaSEObject> T findByDbId(Long dbId);

    @Query("MATCH (n:DatabaSEObject{dbId:{0}}) RETURN n")
    <T extends DatabaSEObject> T findByDbIdnorelations(Long dbId);

    @Query("MATCH (n:DatabaSEObject) WHERE n.dbId IN {0} RETURN n")
    <T extends DatabaSEObject> Collection<T> findByDbIdsnorelations(Collection<Long> dbIds);
}

最后是 setReferenceEntity:

private void setReferenceEntity(IndexDocument document,DatabaSEObject databaSEObject) {
    if (databaSEObject == null) return;

    ReferenceEntity referenceEntity = null;

    if (databaSEObject instanceof EntityWithAccessionedSequence) {
        EntityWithAccessionedSequence ewas = (EntityWithAccessionedSequence) databaSEObject;
        referenceEntity = ewas.getReferenceEntity();
    } else if (databaSEObject instanceof SimpleEntity) {
        SimpleEntity simpleEntity = (SimpleEntity) databaSEObject;
        referenceEntity = simpleEntity.getReferenceEntity();
    }

    if (referenceEntity != null) {
        String identifier = referenceEntity.getIdentifier();
        // ...

        /*
          The problem is here,e.g. with PTEN (R-HSA-199420):
             - Single thread,referenceEntity.getName()  ==> ["PTEN"]
             - Multi thread,referenceEntity.getName()  ==> []
         */
        if (referenceEntity.getName() != null && !referenceEntity.getName().isEmpty()) {
            document.setReferenceName(referenceEntity.getName()[0])
            // ...
        }
        // ...
    }
}

以下是调试器的屏幕截图,以证明这种奇怪的行为:

单线程

PTEN on single-threaded indexing with referenceName filled

多线程

PTEN on multi-threaded indexing with referenceName empty

我疯狂的猜测是 OGM 中实例的延迟加载不能很好地支持多线程,但我真的不知道如何解决这个问题......或者即使这是问题所在。 另一个奇怪的是,除了referenceEntity.name

之外,我们还没有注意到任何其他字段中的类似问题

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)