亿级别记录的mongodb分页查询java代码实现

1.准备环境

  1.1 mongodb下载

  1.2 mongodb启动

     C:\mongodb\bin\mongod --dbpath D:\mongodb\data

  1.3 可视化mongo工具Robo 3T下载

2.准备数据

  

org.mongodb mongo-java-driver 3.6.1

java代码执行

</span><span style="color: #0000ff;"&gt;try</span><span style="color: #000000;"&gt; { </span><span style="color: #008000;"&gt;/**</span><span style="color: #008000;"&gt;** Connect to MongoDB ***</span><span style="color: #008000;"&gt;*/</span> <span style="color: #008000;"&gt;//</span><span style="color: #008000;"&gt; Since 2.10.0,uses MongoClient</span> MongoClient mongo = <span style="color: #0000ff;"&gt;new</span> MongoClient("localhost",27017<span style="color: #000000;"&gt;); </span><span style="color: #008000;"&gt;/**</span><span style="color: #008000;"&gt;** Get database ***</span><span style="color: #008000;"&gt;*/</span> <span style="color: #008000;"&gt;//</span><span style="color: #008000;"&gt; if database doesn't exists,MongoDB will create it for you</span> DB db = mongo.getDB("www"<span style="color: #000000;"&gt;); </span><span style="color: #008000;"&gt;/**</span><span style="color: #008000;"&gt;** Get collection / table from 'testdb' ***</span><span style="color: #008000;"&gt;*/</span> <span style="color: #008000;"&gt;//</span><span style="color: #008000;"&gt; if collection doesn't exists,MongoDB will create it for you</span> DBCollection table = db.getCollection("person"<span style="color: #000000;"&gt;); </span><span style="color: #008000;"&gt;/**</span><span style="color: #008000;"&gt;** Insert ***</span><span style="color: #008000;"&gt;*/</span> <span style="color: #008000;"&gt;//</span><span style="color: #008000;"&gt; create a document to store key and value</span> BasicDBObject document=<span style="color: #0000ff;"&gt;null</span><span style="color: #000000;"&gt;; </span><span style="color: #0000ff;"&gt;for</span>(<span style="color: #0000ff;"&gt;int</span> i=0;i<100000000;i++<span style="color: #000000;"&gt;) { document </span>= <span style="color: #0000ff;"&gt;new</span><span style="color: #000000;"&gt; BasicDBObject(); document.put(</span>"name","mkyong"+<span style="color: #000000;"&gt;i); document.put(</span>"age",30<span style="color: #000000;"&gt;); document.put(</span>"sex","f"<span style="color: #000000;"&gt;); table.insert(document); } </span><span style="color: #008000;"&gt;/**</span><span style="color: #008000;"&gt;** Done ***</span><span style="color: #008000;"&gt;*/</span><span style="color: #000000;"&gt; System.out.println(</span>"Done"<span style="color: #000000;"&gt;); } </span><span style="color: #0000ff;"&gt;catch</span><span style="color: #000000;"&gt; (UnknownHostException e) { e.printStackTrace(); } </span><span style="color: #0000ff;"&gt;catch</span><span style="color: #000000;"&gt; (MongoException e) { e.printStackTrace(); } }</span></pre>

3.分页查询

 传统的limit方式当数据量较大时查询缓慢,不太适用。考虑别的方式,参考了logstash-input-mongodb的思路:

= collection.find({:_id => {:$gt => collection_name </span>=<span style="color: #000000;"&gt; collection[:name] @logger.debug(</span><span style="color: #800000;"&gt;"</span><span style="color: #800000;"&gt;collection_data is: #{@collection_data}</span><span style="color: #800000;"&gt;"</span><span style="color: #000000;"&gt;) last_id </span>=<span style="color: #000000;"&gt; @collection_data[index][:last_id] </span><span style="color: #008000;"&gt;#</span><span style="color: #008000;"&gt;@logger.debug("last_id is #{last_id}",:index => index,:collection => collection_name)</span> <span style="color: #008000;"&gt;#</span><span style="color: #008000;"&gt; get batch of events starting at the last_place if it is set</span>

<span style="color: #000000;">

      last_id_object </span>=<span style="color: #000000;"&gt; last_id
      </span><span style="color: #0000ff;"&gt;if</span> since_type == <span style="color: #800000;"&gt;'</span><span style="color: #800000;"&gt;id</span><span style="color: #800000;"&gt;'</span><span style="color: #000000;"&gt;
        last_id_object </span>=<span style="color: #000000;"&gt; BSON::ObjectId(last_id)
      elsif since_type </span>== <span style="color: #800000;"&gt;'</span><span style="color: #800000;"&gt;time</span><span style="color: #800000;"&gt;'</span>
        <span style="color: #0000ff;"&gt;if</span> last_id != <span style="color: #800000;"&gt;''</span><span style="color: #000000;"&gt;
          last_id_object </span>=<span style="color: #000000;"&gt; Time.at(last_id)
        end
      end
      cursor </span>= get_cursor_for_collection(@mongodb,collection_name,batch_size)</pre>

使用java实现

<span style="color: #0000ff;">import<span style="color: #000000;"> org.bson.types.ObjectId;

<span style="color: #0000ff;">import<span style="color: #000000;"> com.mongodb.BasicDBObject;
<span style="color: #0000ff;">import<span style="color: #000000;"> com.mongodb.DB;
<span style="color: #0000ff;">import<span style="color: #000000;"> com.mongodb.DBCollection;
<span style="color: #0000ff;">import<span style="color: #000000;"> com.mongodb.DBCursor;
<span style="color: #0000ff;">import<span style="color: #000000;"> com.mongodb.DBObject;
<span style="color: #0000ff;">import<span style="color: #000000;"> com.mongodb.MongoClient;
<span style="color: #0000ff;">import<span style="color: #000000;"> com.mongodb.MongoException;

<span style="color: #0000ff;">public <span style="color: #0000ff;">class<span style="color: #000000;"> Test {

</span><span style="color: #0000ff;"&gt;public</span> <span style="color: #0000ff;"&gt;static</span> <span style="color: #0000ff;"&gt;void</span><span style="color: #000000;"&gt; main(String[] args) {
    </span><span style="color: #0000ff;"&gt;int</span> pageSize=50000<span style="color: #000000;"&gt;;

    </span><span style="color: #0000ff;"&gt;try</span><span style="color: #000000;"&gt; {

        </span><span style="color: #008000;"&gt;/**</span><span style="color: #008000;"&gt;** Connect to MongoDB ***</span><span style="color: #008000;"&gt;*/</span>
        <span style="color: #008000;"&gt;//</span><span style="color: #008000;"&gt; Since 2.10.0,MongoDB will create it for you</span>
        DBCollection table = db.getCollection("person"<span style="color: #000000;"&gt;);
        DBCursor dbObjects;            
        Long cnt</span>=<span style="color: #000000;"&gt;table.count();
        </span><span style="color: #008000;"&gt;//</span><span style="color: #008000;"&gt;System.out.println(table.getStats());</span>
        Long page=<span style="color: #000000;"&gt;getPageSize(cnt,pageSize);
        ObjectId lastIdObject</span>=<span style="color: #0000ff;"&gt;new</span> ObjectId("5bda8f66ef2ed979bab041aa"<span style="color: #000000;"&gt;);

        </span><span style="color: #0000ff;"&gt;for</span>(Long i=0L;i<page;i++<span style="color: #000000;"&gt;) {
            Long start</span>=<span style="color: #000000;"&gt;System.currentTimeMillis();
            dbObjects</span>=<span style="color: #000000;"&gt;getCursorForCollection(table,lastIdObject,pageSize);
            System.out.println(</span>"第"+(i+1)+"次查询,耗时:"+(System.currentTimeMillis()-start)/1000+"秒"<span style="color: #000000;"&gt;);
            List</span><DBObject> objs=<span style="color: #000000;"&gt;dbObjects.toArray();
            lastIdObject</span>=(ObjectId) objs.get(objs.size()-1).get("_id"<span style="color: #000000;"&gt;);

        }            

    } </span><span style="color: #0000ff;"&gt;catch</span><span style="color: #000000;"&gt; (UnknownHostException e) {
        e.printStackTrace();
    } </span><span style="color: #0000ff;"&gt;catch</span><span style="color: #000000;"&gt; (MongoException e) {
        e.printStackTrace();
    }


}

</span><span style="color: #0000ff;"&gt;public</span> <span style="color: #0000ff;"&gt;static</span> DBCursor getCursorForCollection(DBCollection collection,ObjectId lastIdObject,<span style="color: #0000ff;"&gt;int</span><span style="color: #000000;"&gt; pageSize) {
    DBCursor dbObjects</span>=<span style="color: #0000ff;"&gt;null</span><span style="color: #000000;"&gt;;
    </span><span style="color: #0000ff;"&gt;if</span>(lastIdObject==<span style="color: #0000ff;"&gt;null</span><span style="color: #000000;"&gt;) {
        lastIdObject</span>=(ObjectId) collection.findOne().get("_id"<span style="color: #000000;"&gt;); //TODO 排序sort取第一个,否则可能丢失数据
    }
    BasicDBObject query</span>=<span style="color: #0000ff;"&gt;new</span><span style="color: #000000;"&gt; BasicDBObject();
    query.append(</span>"_id",<span style="color: #0000ff;"&gt;new</span> BasicDBObject("$gt"<span style="color: #000000;"&gt;,lastIdObject));
    BasicDBObject sort</span>=<span style="color: #0000ff;"&gt;new</span><span style="color: #000000;"&gt; BasicDBObject();
    sort.append(</span>"_id",1<span style="color: #000000;"&gt;);
    dbObjects</span>=<span style="color: #000000;"&gt;collection.find(query).limit(pageSize).sort(sort);
    </span><span style="color: #0000ff;"&gt;return</span><span style="color: #000000;"&gt; dbObjects;
}

</span><span style="color: #0000ff;"&gt;public</span> <span style="color: #0000ff;"&gt;static</span> Long getPageSize(Long cnt,<span style="color: #0000ff;"&gt;int</span><span style="color: #000000;"&gt; pageSize) {
    </span><span style="color: #0000ff;"&gt;return</span> cnt%pageSize==0?cnt/pageSize:cnt/pageSize+1<span style="color: #000000;"&gt;;
}

}

4.一些经验教训

  1. 不小心漏打了一个$符号,导致查询不到数据,浪费了一些时间去查找原因

query.append("_id",new BasicDBObject("$gt",lastIdObject));  2.创建索引  创建普通的单列索引:db.collection.ensureIndex({field:1/-1});  1是升续 -1是降续    实例:db.articles.ensureIndex({title:1}) //注意 field 不要加""双引号,否则创建不成功  查看当前索引状态: db.collection.getIndexes();  实例:  db.articles.getIndexes();  删除单个索引db.collection.dropIndex({filed:1/-1});

      3.执行计划

   db.student.find({"name":"dd1"}).explain()

 参考文献:

【1】https://github.com/phutchins/logstash-input-mongodb/blob/master/lib/logstash/inputs/mongodb.rb

【2】https://www.cnblogs.com/yxlblogs/p/4930308.html

【3】https://docs.mongodb.com/manual/reference/method/db.collection.ensureIndex/

相关文章

文章浏览阅读1.3k次。在 Redis 中,键(Keys)是非常重要的概...
文章浏览阅读3.3k次,点赞44次,收藏88次。本篇是对单节点的...
文章浏览阅读8.4k次,点赞8次,收藏18次。Spring Boot 整合R...
文章浏览阅读978次,点赞25次,收藏21次。在Centos上安装Red...
文章浏览阅读1.2k次,点赞21次,收藏22次。Docker-Compose部...
文章浏览阅读2.2k次,点赞59次,收藏38次。合理的JedisPool资...