Elasticsearch学习笔记

Elasticsearch学习笔记

这篇博客用于记录学习和使用Elasticsearch的过程,主要内容包括安装配置和通过Python访问Elasticsearch。Tips: Elasticsearch安装在一台Linux服务器上。

安装配置Elasticsearch

  1. 下载安装包:Download Elasticsearch;

  2. 解压缩:tar -xvf elasticsearch-7.15.2-linux-x86_64.tar.gz;

  3. 修改config目录下的elasticsearch.yml文件,配置局域网访问:network.host: 0.0.0.0;

  4. 切换到bin目录,敲击命令./elasticsearch启动Elasticsearch,出现以下错误信息:

    ERROR: [2] bootstrap checks Failed. You must address the points described in the following [2] lines before starting Elasticsearch.
    bootstrap check failure [1] of [2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
    bootstrap check failure [2] of [2]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured
    
  5. 由于当前用户拥有的内存权限太小,Elasticsearch不能正常启动,需要修改系统配置文件/etc/sysctl.conf,设置vm.max_map_count=262144,重启系统(或执行sysctl -w vm.max_map_count=262144);

  6. 另外,由于没有指定以下配置项,Elasticsearch不能正常启动:

    • discovery.seed_hosts: 集群主机列表;
    • discovery.seed_providers: 基于配置文件配置集群主机列表;
    • cluster.initial_master_nodes: 启动时初始化的参与选主的node,生产环境必填。

    修改配置文件elasticsearch.yml,设置discovery.seed_hosts: ["192.168.1.xx"]cluster.initial_master_nodes: ["192.168.1.xx:9300"];

  7. 重新启动Elasticsearch,浏览器访问http://192.168.1.xx:9200/

    {
    "name" : "xxxx",
    "cluster_name" : "elasticsearch",
    "cluster_uuid" : "4urQVMKyQgGl0oTM_wvgjQ",
    "version" : {
    	"number" : "7.15.2",
    	"build_flavor" : "default",
    	"build_type" : "tar",
    	"build_hash" : "93d5a7f6192e8a1a12e154a2b81bf6fa7309da0c",
    	"build_date" : "2021-11-04T14:04:42.515624022Z",
    	"build_snapshot" : false,
    	"lucene_version" : "8.9.0",
    	"minimum_wire_compatibility_version" : "6.8.0",
    	"minimum_index_compatibility_version" : "6.0.0-beta1"
    },
    "tagline" : "You KNow, for Search"
    }
    
  8. 安装、配置成功!

通过Python访问Elasticsearch

  1. 安装Elasticsearch的Python客户端:conda install elasticsearch;

  2. 连接Elasticsearch:

    from elasticsearch import Elasticsearch
    
    es = Elasticsearch(hosts=['192.168.1.xx'])
    result = es.indices.create(index='news_and_events', ignore=400)  # 状态码400表示由于已经存在同名Index,创建失败
    print(result)
    
  3. 安装插件elasticsearch-analysis-ik,使Elasticsearch具备中文分词的能力:

    ./elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.15.2/elasticsearch-analysis-ik-7.15.2.zip
    

    安装成功后,重启Elasticsearch;

  4. 填充数据:

    from elasticsearch import Elasticsearch
    from tqdm import tqdm
    
    # 导入本地的模块
    from database import SessionLocal
    from models import Record
    
    es = Elasticsearch(hosts=['192.168.1.xx'])
    mapping = {
    	'properties': {
    		'title': {
    			'type': 'text',
    			'analyzer': 'ik_max_word',
    			'search_analyzer': 'ik_max_word'
    		},
    		'content': {
    			'type': 'text',
    			'analyzer': 'ik_max_word',
    			'search_analyzer': 'ik_max_word'
    		}
    	}
    }
    es.indices.create(index='news_and_events', ignore=400)  # Elasticsearch中的index可以类比关系型数据库里面的database
    es.indices.put_mapping(index='news_and_events', doc_type='records', body=mapping, include_type_name=True)  # doc_type类比关系模式
    
    # 查询数据库,导出所有的新闻和公告
    db = SessionLocal()
    result_set = db.query(Record).all()
    
    for record in tqdm(result_set):
    	data = {
    		'record_id': record.record_id,
    		'title': record.title,
    		'content': record.content
    	}
    	es.create(index='news_and_events', doc_type='records', id=record.record_id, document=data)
    
    # 关闭数据库和Elasticsearch连接
    db.close()
    es.close()
    
  5. 查询数据:

    q = {
    		'query': {
    			'multi_match': {
    				'query': '成都重庆双城经济圈',
    				'fields': ['title^2', 'content']
    			}
    		}
    	}
    results = es.search(q, index='news_and_events', doc_type='records')
    

    Elasticsearch返回的结果:

    {
    	"took": 10,
    	"timed_out": false,
    	"_shards": {
    		"total": 1,
    		"successful": 1,
    		"skipped": 0,
    		"Failed": 0
    	},
    	"hits": {
    		"total": {
    			"value": 623,
    			"relation": "eq"
    		},
    		"max_score": 57.087215,
    		"hits": [
    			{
    				"_index": "news_and_events",
    				"_type": "records",
    				"_id": "2508",
    				"_score": 57.087215,
    				"_source": {
    					"record_id": 2508,
    					"title": "关于成渝地区双城经济圈创新创业峰会的通知",
    					"content": "各学院,各位老师和同学:\n现转发重庆市教育委员会和四川省教育厅等六部门联合发布的《关于举办\"智创巴蜀\"首届成渝地区双城经济圈创新创业峰会的通知》,详见附件。欢迎积极参加。\n联系人:x老师\n联系电话:xxxxxxxx\n教务处\nxxxx年xx月xx日\n附件1-川渝6部门联合发峰会正式文件"
    				}
    			}
    		]
    	}
    }
    

参考资料:

相关文章

TCP/IP套接字登录方法是MySQL在一切服务平台都提供的一种登录...
easy-rule规则引擎最佳落地
Elasticsearch 是一个分布式、高扩展、高实时的搜索与数据分...
最近壹哥的一个学生,在利用spring-data-elasticsearch访问E...
java 操作elasticsearch详细总结
原文链接:http://www.ruanyifeng.com/blog/2017/08/elastic...