Lucene我可以用method替换迭代器吗?

问题描述

我有个主意:

  1. 在文本中寻找模式,
  2. 如果我找到了一个模式,那么我想从文本中获取它的位置。

现在我有 1 个了。

第 2 部分已完成,但它使用迭代器,这意味着我们将在到达我需要的模板之前遍历所有术语,我怎样才能立即获得我的术语并定位文本?

我的代码

public void methodFromStack() throws Exception {
        
    Directory directory = new RAMDirectory();
    IndexWriterConfig indexWriterConfig = new IndexWriterConfig(new StandardAnalyzer());
    IndexWriter writer = new IndexWriter(directory,indexWriterConfig);

    Document doc = new Document();
    // Field.Store.NO,Field.Index.ANALYZED,Field.TermVector.YES
    FieldType type = new FieldType();
    type.setStoreTermVectors(true);
    type.setStoreTermVectorPositions(true);
    type.setStoreTermVectorOffsets(true);
    type.setStored(true);
    type.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
    Field fieldStore = new Field("tags","Kite good world.",type);
    doc.add(fieldStore);
    writer.addDocument(doc);
    writer.close();
    
    DirectoryReader reader = DirectoryReader.open(directory);
    IndexSearcher searcher = new IndexSearcher(reader);
    
    //Поиск по словосочетанию с учетом отступа
    QueryParser queryParser = new QueryParser("tags",new StandardAnalyzer());
    Query query = queryParser.parse("\"Kite World\"~1");
    TopDocs results = searcher.search(query,1);
    
    for ( scoreDoc scoreDoc : results.scoreDocs) {

        Fields termVs = reader.getTermVectors(scoreDoc.doc);
        Terms f = termVs.terms("tags");

        TermsEnum te = f.iterator();
        PostingsEnum docsAndPosEnum = null;
        BytesRef bytesRef;

        //Here iterator,output all terms,but i need get one my result term and possition
        while ((bytesRef = te.next()) != null) {
            docsAndPosEnum = te.postings(docsAndPosEnum,PostingsEnum.ALL);
            // for each term (iterator next) in this field (field)
            // iterate over the docs (should only be one)
            int nextDoc = docsAndPosEnum.nextDoc();
            assert nextDoc != DocIdSetIterator.NO_MORE_DOCS;
            final int fr = docsAndPosEnum.freq();
            final int p = docsAndPosEnum.nextPosition();
            final int o = docsAndPosEnum.startOffset();
            
            System.out.println("Word: " + bytesRef.utf8ToString());
            System.out.println("Position: "+ p + ",startOffset: " + o + " length: " 
 +bytesRef.length + " Freg: " + fr);
        
            if(fr > 1){
                for(int iter = 1; iter <= fr-1; iter++) {
                    System.out.println("Possition: "+ docsAndPosEnum.nextPosition());
                }
          
            }


        }
    }
}

(我知道在旧版本的 Lucene 库中有类 TermFreqVector 和类 TermPositionVector?,但是随着从 3 到 4 过渡到新版本,发生了变化。在这些变化之后,我发现的是采用迭代器。

使用:Windows+NetBeans+maven+Lucene 7.4.0)

解决方法

解决问题的方法:使用方法seekExact,你可以使用该代码进行测试:

        TermsEnum te = f.iterator();
        PostingsEnum docsAndPosEnum = null;
        if (te.seekExact(ref)) { 
            
            docsAndPosEnum = te.postings(docsAndPosEnum,PostingsEnum.ALL);
            int nextDoc = docsAndPosEnum.nextDoc();
            assert nextDoc != DocIdSetIterator.NO_MORE_DOCS;
            final int freg = docsAndPosEnum.freq();
            final int pos = docsAndPosEnum.nextPosition();
            final int o = docsAndPosEnum.startOffset();

            System.out.println("Word: " + ref.utf8ToString());
            System.out.println("Position: " + pos + ",startOffset: " + o + " length: " + ref.length + " Freg: " + freg);