问题描述
我正在尝试将我的 neo4j 数据库索引到 Solr。
为此,我想使用多线程来加速这个漫长的过程。然而,通过这样做,有时至少有一个字段是空的,而如果我们在没有多线程的情况下这样做,这个字段就会充满一个名称列表。
从我们索引的实例到该字段的具体路径如下:
(PhysicalEntity) --[ReferenceEntity]--> (ReferenceEntity.name)
为此,这是我当前的代码:
private int indexBySchemaClass(Class<? extends DatabaSEObject> clazz) throws IndexerException {
Collection<Long> allOfGivenClass = schemaService.getDbIdsByClass(clazz);
LinkedBlockingQueue<IndexDocument> allDocuments = new LinkedBlockingQueue<>(500);
List<Long> missingDocuments = Collections.synchronizedList(new ArrayList<>());
// ...
allOfGivenClass.parallelStream().forEach( dbId -> {
try {
IndexDocument document = this.documentBuilder.createSolrDocument(dbId); // transactional
if (document != null) {
allDocuments.put(document);
} else {
missingDocuments.add(dbId);
}
} catch (InterruptedException e) {
e.printstacktrace();
// Really,I'm not expecting any interruptions... so if they are caught,interrupts *this*
Thread.currentThread().interrupt();
}
});
}
这是 DocumentBuilder.createSolrDocuement(dbId)
@Transactional
IndexDocument createSolrDocument(Long dbId) {
IndexDocument document = new IndexDocument();
/*
* Query the Graph and load only Primitives and no Relations attributes.
* Lazy-loading will load them on demand.
*/
DatabaSEObject databaSEObject;
try {
databaSEObject = databaSEObjectRepository.findByDbId(dbId);
} catch (MappingException e) {
logger.error("There has been an error mapping the object with dbId: " + dbId,e);
return null;
}
document.setDbId(databaSEObject.getDbId().toString());
// ...
if (databaSEObject instanceof PhysicalEntity) {
PhysicalEntity physicalEntity = (PhysicalEntity) databaSEObject;
// ...
setReferenceEntity(document,physicalEntity);
} else if (databaSEObject instanceof Event) {
// ...
}
return document;
}
数据库对象存储库:
@Repository
public interface DatabaSEObjectRepository extends GraphRepository<DatabaSEObject>{
//Derived query
<T extends DatabaSEObject> T findByDbId(Long dbId);
@Query("MATCH (n:DatabaSEObject{dbId:{0}}) RETURN n")
<T extends DatabaSEObject> T findByDbIdnorelations(Long dbId);
@Query("MATCH (n:DatabaSEObject) WHERE n.dbId IN {0} RETURN n")
<T extends DatabaSEObject> Collection<T> findByDbIdsnorelations(Collection<Long> dbIds);
}
最后是 setReferenceEntity:
private void setReferenceEntity(IndexDocument document,DatabaSEObject databaSEObject) {
if (databaSEObject == null) return;
ReferenceEntity referenceEntity = null;
if (databaSEObject instanceof EntityWithAccessionedSequence) {
EntityWithAccessionedSequence ewas = (EntityWithAccessionedSequence) databaSEObject;
referenceEntity = ewas.getReferenceEntity();
} else if (databaSEObject instanceof SimpleEntity) {
SimpleEntity simpleEntity = (SimpleEntity) databaSEObject;
referenceEntity = simpleEntity.getReferenceEntity();
}
if (referenceEntity != null) {
String identifier = referenceEntity.getIdentifier();
// ...
/*
The problem is here,e.g. with PTEN (R-HSA-199420):
- Single thread,referenceEntity.getName() ==> ["PTEN"]
- Multi thread,referenceEntity.getName() ==> []
*/
if (referenceEntity.getName() != null && !referenceEntity.getName().isEmpty()) {
document.setReferenceName(referenceEntity.getName()[0])
// ...
}
// ...
}
}
以下是调试器的屏幕截图,以证明这种奇怪的行为:
我疯狂的猜测是 OGM 中实例的延迟加载不能很好地支持多线程,但我真的不知道如何解决这个问题......或者即使这是问题所在。 另一个奇怪的是,除了referenceEntity.name
之外,我们还没有注意到任何其他字段中的类似问题解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)