知识问答

边学边实战系列（十二）：ElasticSearch 常用 Curl 命令实践

前面介绍了 ElasticSearch 基础概念、技术原理、安装和基础使用、索引管理、 DSL 查询、聚合查询、索引文档与读取文档流程、集群部署、规划与运维经验总结、集群规划与运维、分片/副本与数据...

前面介绍了 ElasticSearch 基础概念、技术原理、安装和基础使用、索引管理、 DSL 查询、聚合查询、索引文档与读取文档流程

、集群部署、规划与运维经验总结、集群规划与运维、数据备份与迁移等相关的知识点。今天我将详细的为大家介绍 ElasticSearch 常用命令 curl相关知识。

Elasticsearch 常用命令 curl ，这是日常工作中常用的命令，也是非常实用的命令，所以，今天单独拉出来和大家一同学习一下。

前言

测试环境：

Centos7.2 64位
jdk1.8.0_91
elasticsearch-2.2.0

CURL 命令

简单认为是可以在命令行下访问url的一个工具
curl是利用URL语法在命令行方式下工作的开源文件传输工具，使用curl可以简单实现常见的get/post请求。
-x 指定http请求的方法（HEAD GET POST PUT DELETE）
-d 指定要传输的数据

CURL建立索引库

PUT/POST都可以：

[hadoop@h153 ~]$ curl -XPUT 'http://192.168.205.153:9200/index_name/'
{"acknowledged":true}

CURL创建索引

[hadoop@h153 ~]$ curl -XPOST http://192.168.205.153:9200/hui/employee/1 -d '{undefined
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}'
{"_index":"hui","_type":"employee","_id":"1","_version":1,"_shards":{"total":2,"successful":1,"failed":0},"created":true}

使用文件的方式创建：

[hadoop@h153 ~]$ vi qiang.json
{undefined
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
[hadoop@h153 ~]$ curl -XPOST '192.168.205.153:9200/qiang' -d @qiang.json
{"acknowledged":true}

PUT和POST用法：PUT是幂等方法，POST不是。所以PUT用于更新、POST用于新增比较合适。

PUT，DELETE操作是幂等的。所谓幂等是指不管进行多少次操作，结果都一样。比如我用PUT修改一篇文章，然后在做同样的操作，每次操作后的结果并没有不同，DELETE也是一样。
POST操作不是幂等的，比如常见的POST重复加载问题：当我们多次发出同样的POST请求后，其结果是创建出了若干的资源。
还有一点需要注意的就是，创建操作可以使用POST，也可以使用PUT，区别在于POST是作用在一个集合资源之上的（/articles），而PUT操作是作用在一个具体资源之上的（/articles/123），比如说很多资源使用数据库自增主键作为标识信息，而创建的资源的标识信息到底是什么只能由服务端提供，这个时候就必须使用POST。

创建索引注意事项：

索引库名称必须要全部小写，不能以下划线开头，也不能包含逗号。如果没有明确指定索引数据的ID，那么es会自动生成一个随机的ID，需要使用POST参数

[hadoop@h153 ~]$ curl -XPOST http://192.168.205.153:9200/hui/emp/ -d '{"first_name" : "John"}'
{"_index":"hui","_type":"emp","_id":"AV8MoiLdq8PZVDlk6J74","_version":1,"_shards":{"total":2,"successful":1,"failed":0},"created":true}

如果想要确定我们创建的都是全新的内容：

1：使用自增ID
2：在url后面添加参数

[hadoop@h153 ~]$ curl -XPUT http://192.168.205.153:9200/hui/emp/2?op_type=create -d '{"name":"zs","age":25}'
{"_index":"hui","_type":"emp","_id":"2","_version":1,"_shards":{"total":2,"successful":1,"failed":0},"created":true}

[hadoop@h153 ~]$ curl -XPUT http://192.168.205.153:9200/hui/emp/2/_create -d '{"name":"laoxiao","age":25}'
{"error":{"root_cause":[{"type":"document_already_exists_exception","reason":"[emp][2]: document already exists","shard":"2","index":"hui"}],"type":"document_already_exists_exception","reason":"[emp][2]: document already exists","shard":"2","index":"hui"},"status":409}
# 注：如果存在同名文件，Elasticsearch将会返回一个409Conflict的HTTP反馈码

GET查询索引

根据员工id查询

[hadoop@h153 ~]$ curl -XGET http://192.168.205.153:9200/hui/employee/1?pretty
{
  "_index" : "hui",
  "_type" : "employee",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "first_name" : "John",
    "last_name" : "Smith",
    "age" : 25,
    "about" : "I love to go rock climbing",
    "interests" : [ "sports", "music" ]
  }
}

[hadoop@h153 ~]$ curl -XGET http://192.168.205.153:9200/hui/emp/2?pretty
{
  "_index" : "hui",
  "_type" : "emp",
  "_id" : "2",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "name" : "zs",
    "age" : 25
  }
}

在任意的查询字符串中添加pretty参数，es可以得到易于识别的json结果。
curl后添加-i 参数，这样你就能得到反馈头文件。

[hadoop@h153 ~]$ curl -i  'http://192.168.205.153:9200/hui/emp/1?pretty'
HTTP/1.1 404 Not Found
Content-Type: application/json; charset=UTF-8
Content-Length: 76

{
  "_index" : "hui",
  "_type" : "emp",
  "_id" : "1",
  "found" : false
}

检索文档中的一部分，如果只需要显示指定字段

[hadoop@h153 ~]$ curl -XGET http://192.168.205.153:9200/hui/employee/1?_source=name,age
{"_index":"hui","_type":"employee","_id":"1","_version":1,"found":true,"_source":{"age":25}}

如果只需要source的数据

[hadoop@h153 ~]$ curl -XGET http://192.168.205.153:9200/hui/employee/1?_source
{"_index":"hui","_type":"employee","_id":"1","_version":1,"found":true,"_source":{undefined
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}}

查询所有

你可以再返回的hits中发现我们录入的文档。搜索会默认返回最前的10个数值：

[hadoop@h153 ~]$ curl -XGET http://192.168.205.153:9200/hui/employee/_search
{"took":21,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"hui","_type":"employee","_id":"1","_score":1.0,"_source":{undefined
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}}]}}

根据条件进行查询

[hadoop@h153 ~]$ curl -XGET http://192.168.205.153:9200/hui/_search?q=last_name:Smith
{"took":26,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.30685282,"hits":[{"_index":"hui","_type":"employee","_id":"1","_score":0.30685282,"_source":{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}}]}}

不根据条件查询：

[hadoop@h153 ~]$ curl -XGET http://192.168.205.153:9200/hui/_search
{"took":5,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":3,"max_score":1.0,"hits":[{"_index":"hui","_type":"emp","_id":"AV8MoiLdq8PZVDlk6J74","_score":1.0,"_source":{"first_name" : "John"}},{"_index":"hui","_type":"emp","_id":"2","_score":1.0,"_source":{"name":"zs","age":25}},{"_index":"hui","_type":"employee","_id":"1","_score":1.0,"_source":{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}}]}}

DSL查询

Domain Specific Language（领域特定语言），这里只给出了最简单的例子，还可以写的更复杂，比如还可以添加过滤等复杂的条件：

[hadoop@h153 ~]$ curl -XGET http://192.168.205.153:9200/hui/employee/_search -d '{"query":{"match":{"last_name":"Smith"}}}'
{"took":5,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.30685282,"hits":[{"_index":"hui","_type":"employee","_id":"1","_score":0.30685282,"_source":{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}}]}}

MGET查询

使用mget API获取多个文档，先再创建一个索引：curl -XPOST http://192.168.205.153:9200/website/blog/2 -d '{"first_name":"John" , "last_name":"Smith"}'，再进行查询。

[hadoop@h153 ~]$ curl -XGET http://192.168.205.153:9200/_mget?pretty -d '{"docs":[{"_index":"hui","_type":"emp","_id":2,"_source":"name"},{"_index":"website","_type":"blog","_id":2}]}'
# 返回结果
{
  "docs" : [ {
    "_index" : "hui",
    "_type" : "emp",
    "_id" : "2",
    "_version" : 1,
    "found" : true,
    "_source" : {
      "name" : "zs"
    }
  }, {
    "_index" : "website",
    "_type" : "blog",
    "_id" : "2",
    "_version" : 1,
    "found" : true,
    "_source" : {
      "first_name" : "John",
      "last_name" : "Smith"
    }
  } ]
}

如果你需要的文档在同一个_index或者同一个_type中，你就可以在URL中指定一个默认的/_index或者/_index/_type

curl -XGET http://192.168.205.153:9200/hui/_mget?pretty -d '{"docs":[{"_type":"employee","_id":1},{"_type":"emp","_id":2}]}'

如果所有的文档拥有相同的_index 以及_type，直接在请求中添加ids的数组即可

curl -XGET http://192.168.205.153:9200/hui/emp/_mget?pretty -d '{"ids":["1","2"]}'

统计es的索引数量

[hadoop@h153 ~]$ curl -XGET 'http://192.168.205.153:9200/_cat/count'
1508265400 02:36:40 44542

当然我们如果想统计某些特定索引下文档数量也是可以的。例如我们想要hui下的索引数量：

[hadoop@h153 ~]$ curl -XGET 'http://192.168.205.153:9200/_cat/count/hui'
1508265602 02:40:02 0

查看所有的索引信息

[hadoop@h153 ~]$ curl -XGET 'http://192.168.205.153:9200/_cat/indices?pretty'
green open test  5 1 0 0  1.5kb   795b 
green open qiang 5 0 0 0   795b   795b 
green open hui   5 1 4 0 41.6kb 20.8kb

当然如果我们想要查看固定类型的索引信息是否存在：

[hadoop@h153 ~]$ curl -XGET 'http://192.168.205.153:9200/_cat/indices/hui?pretty'
green open hui 5 1 4 0 41.6kb 20.8kb

下面我们介绍一个如何进行强制段合并的命令：

[hadoop@h153 ~]$ curl -XPOST 'http://192.168.205.153:9200/hui/_forcemerge?max_num_segments=1'
{"_shards":{"total":5,"successful":5,"failed":0}}

HEAD使用

如果只想检查一下文档是否存在，你可以使用HEAD来替代 GET方法，这样就只会返回HTTP头文件：

[hadoop@h153 ~]$ curl -i -XHEAD http://192.168.205.153:9200/hui/emp/1
HTTP/1.1 404 Not Found
Content-Type: text/plain; charset=UTF-8
Content-Length: 0

Elasticsearch的更新

ES可以使用PUT或者POST对文档进行更新，如果指定ID的文档已经存在，则执行更新操作。注意：执行更新操作的时候

ES首先将旧的文档标记为删除状态
然后添加新的文档
旧的文档不会立即消失，但是你也无法访问
ES会在你继续添加更多数据的时候在后台清理已经标记为删除状态的文档

局部更新，可以添加新字段或者更新已有字段（必须使用POST）：

[hadoop@h153 ~]$ curl -XPOST http://192.168.205.153:9200/hui/emp/2/_update -d '{"doc":{"city":"beijing","car":"BMW"}}'
{"_index":"hui","_type":"emp","_id":"2","_version":2,"_shards":{"total":2,"successful":1,"failed":0}}

Elasticsearch的删除

[hadoop@h153 ~]$ curl -XDELETE http://192.168.205.153:9200/hui/emp/2/
{"found":true,"_index":"hui","_type":"emp","_id":"2","_version":3,"_shards":{"total":2,"successful":1,"failed":0}}

注：如果文档存在，found属性值为true，_version属性的值+1，这个就是内部管理的一部分，它保证了我们在多个节点间的不同操作的顺序都被正确标记了。

Elasticsearch的批量操作bulk

与mget类似，bulk API可以帮助我们同时执行多个请求，格式：

– action：index/create/update/delete
– metadata：_index,_type,_id
– request body：_source(删除操作不需要)
{ action: { metadata }}\n
{ request body        }\n
{ action: { metadata }}\n
{ request body        }\n

使用curl -XPOST -d时注意，不能直接在json字符串中添加\n字符，应该按回车

[hadoop@h153 ~]$ curl -XPOST '192.168.205.153:9200/_bulk?pretty' -H 'Content-Type: application/json' -d'
> { "delete": { "_index": "hui", "_type": "employee", "_id": "1" }} 
> { "create": { "_index": "website", "_type": "blog", "_id": "123" }}
> { "title":    "My first blog post" }
> { "index":  { "_index": "website", "_type": "blog" }}
> { "title":    "My second blog post" }
> { "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} }
> { "doc" : {"title" : "My updated blog post"} }
> '
{
  "took" : 197,
  "errors" : false,
  "items" : [ {
    "delete" : {
      "_index" : "hui",
      "_type" : "employee",
      "_id" : "1",
      "_version" : 2,
      "_shards" : {
        "total" : 2,
        "successful" : 1,
        "failed" : 0
      },
      "status" : 200,
      "found" : true
    }
  }, {
    "create" : {
      "_index" : "website",
      "_type" : "blog",
      "_id" : "123",
      "_version" : 1,
      "_shards" : {
        "total" : 2,
        "successful" : 1,
        "failed" : 0
      },
      "status" : 201
    }
  }, {
    "create" : {
      "_index" : "website",
      "_type" : "blog",
      "_id" : "AV8XEEpF4TG7AylMbq5H",
      "_version" : 1,
      "_shards" : {
        "total" : 2,
        "successful" : 1,
        "failed" : 0
      },
      "status" : 201
    }
  }, {
    "update" : {
      "_index" : "website",
      "_type" : "blog",
      "_id" : "123",
      "_version" : 2,
      "_shards" : {
        "total" : 2,
        "successful" : 1,
        "failed" : 0
      },
      "status" : 200
    }
  } ]
}

create和index的区别：如果数据存在，使用create操作失败，会提示文档已经存在，使用index则可以成功执行。

使用文件的方式

vi requests 

curl -XPOST/PUT localhost:9200/_bulk --data-binary @request

bulk一次最大处理多少数据量：

bulk会把将要处理的数据载入内存中，所以数据量是有限制的
最佳的数据量不是一个确定的数值，它取决于你的硬件，你的文档大小以及复杂性，你的索引以及搜索的负载
一般建议是1000-5000个文档，如果你的文档很大，可以适当减少队列，大小建议是5-15MB，默认不能超过100M，可以在es的配置文件中修改这个值http.max_content_length: 100mb

Elasticsearch的版本控制

普通关系型数据库使用的是（悲观并发控制（PCC））：当我们在读取一个数据前先锁定这一行，然后确保只有读取到数据的这个线程可以修改这一行数据。

ES使用的是（乐观并发控制（OCC））：ES不会阻止某一数据的访问，然而，如果基础数据在我们读取和写入的间隔中发生了变化，更新就会失败，这时候就由程序来决定如何处理这个冲突。它可以重新读取新数据来进行更新，又或者将这一情况直接反馈给用户。

ES如何实现版本控制（使用es内部版本号）：首先得到需要修改的文档，获取版本( _version )号：

[hadoop@h153 ~]$ curl -XGET http://192.168.205.153:9200/hui/emp/2
{"_index":"hui","_type":"emp","_id":"2","_version":2,"found":true,"_source":{"name":"zs","age":25}}

在执行更新操作的时候把版本号传过去：

[hadoop@h153 ~]$ curl -XPUT http://192.168.205.153:9200/hui/emp/2?version=3 -d '{"name":"zs","age":25}'
{"error":{"root_cause":[{"type":"version_conflict_engine_exception","reason":"[emp][2]: version conflict, current [2], provided [3]","shard":"2","index":"hui"}],"type":"version_conflict_engine_exception","reason":"[emp][2]: version conflict, current [2], provided [3]","shard":"2","index":"hui"},"status":409}

[hadoop@h153 ~]$ curl -XPUT http://192.168.205.153:9200/hui/emp/2?version=2 -d '{"name":"zs","age":25}'(覆盖)
{"_index":"hui","_type":"emp","_id":"2","_version":3,"_shards":{"total":2,"successful":1,"failed":0},"created":false}

[hadoop@h153 ~]$ curl -XPOST http://192.168.205.153:9200/hui/emp/2/_update?version=3 -d '{"doc":{"city":"beijing","car":"BMW"}}'(部分更新)
{"_index":"hui","_type":"emp","_id":"2","_version":4,"_shards":{"total":2,"successful":1,"failed":0}}

注：如果传递的版本号和待更新的文档的版本号不一致，则会更新失败。

ES如何实现版本控制(使用外部版本号)

如果你的数据库已经存在了版本号，或者是可以代表版本的时间戳。这时就可以在es的查询url后面添加version_type=external来使用这些号码。
注意：版本号码必须要是大于0小于9223372036854775807（Java中long的最大正值）的整数。

es在处理外部版本号的时候，它不再检查·_version·是否与请求中指定的数值是否相等，而是检查当前的·_version·是否比指定的数值小，如果小，则请求成功。example：

[hadoop@h153 ~]$ curl -XPUT 'http://192.168.205.153:9200/hui/emp/2?version=10&version_type=external' -d '{"name":"laoxiao"}'
{"_index":"hui","_type":"emp","_id":"2","_version":10,"_shards":{"total":2,"successful":1,"failed":0},"created":false}

注意：此处url前后的引号不能省略，否则执行的时候会报错。

Elasticsearch的插件

站点插件（以网页形式展现）：

BigDesk Plugin (作者 Luká? Vl?ek)
- 简介：监控es状态的插件，推荐！
Elasticsearch Head Plugin (作者 Ben Birch)
- 简介：很方便对es进行各种操作的客户端。
Paramedic Plugin (作者 Karel Mina?ík)
- 简介：es监控插件
SegmentSpy Plugin (作者 Zachary Tong)
- 简介：查看es索引segment状态的插件
Inquisitor Plugin (作者 Zachary Tong)
- 简介：这个插件主要用来调试你的查询。

这个主要提供的是节点的实时状态监控，包括jvm的情况， linux的情况，elasticsearch的情况：

安装bin/plugin install lukas-vlcek/bigdesk
删除bin/plugin remove bigdesk

登录网页查看：

http://192.168.205.153:9200/_plugin/bigdesk/

安装head插件：

bin/plugin -install mobz/elasticsearch-head