如何在弹性搜索中启用滚动功能

问题描述

我有一个通过弹性搜索提供服务的网址api。

  • 我的网址是https://data.emp.com/employees
  • 索引中有50名员工(数据)
  • 在每个卷轴上,将有7名员工添加7,14,21..49,50
  • 在每个滚动条上,将首先显示7名员工,然后是14名员工,.. 49,50名员工
  • URL下的我的Api一口气让所有50名员工
    def elastic_search():
        """
         Return full search using match_all
        """
        try:
     
            full_search= es.search(index="employees",scroll = '2m',size = 10,body={ "query": {"match_all": {}}})
            hits_search = full_search['hits']['hits']
            return hits_search 
        except Exception as e:
            logger.exception("Error" + str(e))
            raise

修改了上面的代码,如下所示

        sid =  search["_scroll_id"]
        scroll_size = search['hits']['total']
        scroll_size = scroll_size['value']
        # Start scrolling
        while (scroll_size > 0):

            #print("Scrolling...")
            page = es.scroll(scroll_id = sid,scroll = '1m')

            #print("Hits : ",len(page["hits"]["hits"]))
            
            # Update the scroll ID
            sid = page['_scroll_id']
        
            # Get the number of results that we returned in the last scroll
            scroll_size = len(page['hits']['hits'])
            search_text = page['hits']['hits']
            print (search_text)

我的api抛出[],因为我的最后search_text空了。 在日志中,它每组打印7名员工。但是我的Web网址api正在加载,最后显示空白页

请帮助更新返回elastic_search函数中的“ hits_search”

解决方法

如果您的文档小于或等于10k ,我猜Elasticsearch from and size将为您解决问题。但是,如果您仍然想使用scroll API,那么这就是您所需要的,

    # declare a filter query dict object
    match_all = {
        "size": 7,"query": {
            "match_all": {}
        }
    }

    # make a search() request to get all docs in the index
    resp = client.search(
        index = 'employees',body = match_all,scroll = '2s' # length of time to keep search context
    )
    
    # process the first 7 documents here from resp
    # iterate over the document hits for each 'scroll'
    for doc in resp['hits']['hits']:
        print ("\n",doc['_id'],doc['_source'])
        doc_count += 1
        print ("DOC COUNT:",doc_count)
    
    # keep track of pass scroll _id
    old_scroll_id = resp['_scroll_id']

    # use a 'while' iterator to loop over document 'hits'
    while len(resp['hits']['hits']):

        # make a request using the Scroll API
        resp = client.scroll(
            scroll_id = old_scroll_id,size = 7,scroll = '2s' # length of time to keep search context
        )

        # iterate over the document hits for each 'scroll'
        for doc in resp['hits']['hits']:
            print ("\n",doc['_source'])
            doc_count += 1
            print ("DOC COUNT:",doc_count)

请参见参考文献: https://kb.objectrocket.com/elasticsearch/how-to-use-python-to-make-scroll-queries-to-get-all-documents-in-an-elasticsearch-index-752