elasticsearch基础API第一篇

本文是学习 elasticsearch 时整理的基础 API 使用实例，记录了一些常用的 elasticsearch 的 API 调用实现方式。

Reset Api

数据查询

新增数据,数据存在则更新数据，不存在则新增数据

json

PUT negacorp/employee/4
{
  "name" : "zhangsan",
  "sex" : "nan",
  "hobby" : "no hobby"
}

响应数据,创建时 created 为 true，更新时为 false

json

{
  "_index": "website",
  "_type": "blog",
  "_id": "123",
  "_version": 2,
  "created": false
}

基本查询

GET negacorp/employee/_search?q=zhangsan

字段查询

GET negacorp/employee/_search?q=name:zhangsan
GET /blog/posts/_search?q=title:php+content:php

获取指定的字段

GET /website/blog/123?_source=title,text

只获取原始的存入数据

GET negacorp/employee/1/_source

检测数据是否存在

HEAD negacorp/employee/1

query_string 改写

json

//匹配查询包含 basketball和too
GET /negacorp/employee/_search
{
  "query": {
    "match": {
      "hobby": "basketball too"
    }
  }
}
//查询包含短语basketball too
GET /negacorp/employee/_search
{
  "query": {
    "match_parase": {
      "hobby": "basketball too"
    }
  }
}
//查询与basketball too开始的
GET /negacorp/employee/_search
{
  "query": {
    "match_phrase_prefix": {
      "hobby": "basketball too"
    }
  }
}

bool 查询

关键字	对照意义
must	相当于关系数据库中的 and 查询
should	相当于关系数据库中的 or 查询
must_not	相当于关系数据库中的 not 查询

json

// 查询必须包含`basketball`和`too`的
GET /negacorp/employee/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "hobby": "basketball"
          }
        },{
          "match": {
            "hobby": "too"
          }
        }
      ]
    }
  }
}

// 查询包含basketball或too的
GET /negacorp/employee/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "hobby": "basketball"
          }
        },{
          "match": {
            "hobby": "too"
          }
        }
      ]
    }
  }
}

// 查询必须不包含basketball和too的
GET /negacorp/employee/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "match": {
            "hobby": "basketball"
          }
        },{
          "match": {
            "hobby": "too"
          }
        }
      ]
    }
  }
}

// 查询所有包含basketball但是不包含too的
GET /negacorp/employee/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "hobby": "basketball"
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "hobby": "too"
          }
        }
      ]
    }
  }
}

查询过滤

json

// 此种方式在5.0中已经废弃使用，可以使用下边的查询过滤替换
GET negacorp/employee/_search
{
  "query": {
    "filter": {
      "filter": {
        "range": {
          "age": {
            "gt": 30
          }
        }
      },
      "query": {
        "match": {
          "last_name": "smith"
        }
      }
    }
  }
}
// 过滤查询
GET negacorp/employee/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "age": {
            "gt": 0
          }
        }
      },
      "must": {
        "match": {
          "hobby": "basketball"
        }
      }
    }
  }
}

聚合查询

json

// 聚合分组后，计算平均值
GET negacorp/employee/_search
{
  "size": 0,
  "aggs": {
    "group_by_name": {
      "terms": {
        "field": "name",
        "size": 10
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}
// 以name分组，分组后使用age大小按降序排列
GET negacorp/employee/_search
{
  "size": 0,
  "aggs": {
    "group_by_name": {
      "terms": {
        "field": "name",
        "size": 10,
        "order": {
          "avg_age": "desc"
        }
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}
// 分组查询，计算每组的平均值
GET negacorp/employee/_search
{
  "size": 0,
  "aggs": {
    "group_by_age": {
      "range": {
        "field": "age",
        "ranges": [
          {"from": 0,"to": 1},
          {"from": 1,"to": 2},
          {"from": 2,"to": 3},
          {"from": 3,"to": 5}
        ]
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

排序，未指明排序规则的情况下默认使用_score 关联度字段排序

json

// 排序查询
GET negacorp/employee/_search?_source
{
  "sort": [
    {
      "age": {
        "order": "desc","mode":"min"
      }
    },
    "name",
    "_score"
  ]
}

mode 选项值

mode	规则
min	最小值
max	最大值
sum	和
avg	平均值

嵌套排序

missing 设置，当搜索的文档中确实相关字段时，可以将字段设置为 _first、_last 或者设置为自定义的值

json

GET negacorp/employee/_search?_source
{
  "sort": [
    {
      "test": {
        "order": "desc","mode":"min","missing": "this is default value"
      }
    },
    "name",
    "_score"
  ]
}

默认情况下 fielddata 默认是 false 的，主要原因是开启 text 的 fielddata 后对内存的占用很高，字段将会加载入内存中，导致内存占用升高。

json

POST /negacorp/_mapping/employee
{
  "employee": {
    "properties": {
      "field_name": {
        "type": "text",
        "fielddata": true
      }
    }
  }
}

_source 过滤查询

json

// 禁用_source查询
GET /blog/posts/_search?_source
{
  "_source": false,
  "query": {
    "term": {
      "title": {
        "value": "php"}
    }
  }
}

// 使用通配符控制返回值
{
    "_source": "obj.*",
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}
or
{
    "_source": [ "obj1.*", "obj2.*" ],
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

// 除此之外还可以使用include和exclude包含或者排除某些值，对返回值进行完整的控制
{
    "_source": {
        "include": [ "obj1.*", "obj2.*" ],
        "exclude": [ "*.description" ],
    }
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

json

GET negacorp/employee/_search
{
  "aggs": {
    "hobby": {
      "terms": {
        "field": "name"
      }
    }
  }
}

json

PUT negacorp/_mapping/employee
{
  "properties": {
    "name":{
      "type":"text",
      "fielddata":true
    }
  }
}

删除一个文档

DELETE negacorp/employee/4

部分更新 api，新增，修改部分字段，内部的修改步骤和直接修改全部字段一致：找回-修改-索引，区别于 put 操作此部分操作全部在分片内部完成，无需多次网络请求，降低了由于网络请求而带来的消耗，提高了性能。

json

POST negacorp/employee/4/_update
{
  "doc": {
    "tags":["zainan","dashu"],
    "hobby":"pingpang"
  }
}

返回数据为

json

{
  "_index": "negacorp",
  "_type": "employee",
  "_id": "4",
  "_version": 5,
  "result": "updated",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 5,
  "_primary_term": 2
}

使用脚本语言(mvel)更新数据

更新 age 的值

json

POST negacorp/employee/4/_update
{
  "script": "ctx._source.age+=1"
}

一次性获取多个文档

json

GET /_mget
{
  "docs": [
    {
      "_index": "blog",
      "_type":"posts",
      "_id":1
    },
    {
      "_index": "negacorp",
      "_type":"employee",
      "_id":1
    }
  ]
}

获取同一个 index 下的不同 type 下的多个文档

json

GET /blog/_mget
{
  "docs": [
    {
      "_type":"posts",
      "_id":1
    },
    {
      "_type":"user ·",
      "_id":2
    }
  ]
}

获取同一个 type 下的多个文档

json

GET /blog/posts/_mget
{
  "docs": [
    {
      "_id":1
    },
    {
      "_id":2
    }
  ]
}

// 或者
GET /blog/posts/_mget
{
  "docs": [
    "ids"["1","2"]
  ]
}

注意：文档未获取到时 status 字段依旧为 200，文档的是否找到的标识依据 found 字段进行判断。

合并处理多次的请求

json

POST /_bulk
{"create": {"_index": "negacorp","_type":"employee","_id":"5"}}
{"name":"zhangsan's son"}
{"delete": {"_index": "negacorp","_type":"employee","_id":"5"}}
{"update": {"_index": "blog","_type":"posts","_id":"1"}}
{"doc":{"title":"this is my first blog post"}}

合并请求还支持合并 index 或 type 的请求

json

POST /negacorp/employee/_bulk
{"create": {"_index": "negacorp","_type":"employee","_id":"5"}}
{"name":"zhangsan's son"}
{"delete": {"_index": "negacorp","_type":"employee","_id":"5"}}

分片的相关操作

创建索引是指定主分片和副本分片的数量

json

PUT /blog/test
{
  "settings": {
    "number_of_shards": 12,
    "number_of_replicas": 1
  }
}

变更副本分片的数量

json

PUT /post/test/_settings
{
  "number_of_replicas": 2
}

需要注意的是：主分片的数量涉及到 elasticsearch 索引文档时的路由 hash 转换操作相关操作，一旦设定将无法更改，但是副本分片是可以变更的，因此可以通过变更副本分片的数量来提高集群的吞吐能力。

分词器

分词句柄

json

GET /_analyze
{
  "analyzer": "ik_smart",
  "text": "php是世界上最好的语言"
}

分词句柄，字段对照：

字段	解释
analyzer	使用的分词器
text	具体的分词字段

分词器返回字段对照表：

字段	解释
token	分词器拆分的最小单元，是实际存储到搜索引擎中的词条。
start_offset	具体的单元在原文的起始位置。
end_offset	具体的单元在原文的结束位置。
type	具体使用的过滤器名称。
position	分词分词单元在分词结果的位置。

ik_smart 分词器分词效果

json

{
  "tokens": [
    {
      "token": "php",
      "start_offset": 0,
      "end_offset": 3,
      "type": "ENGLISH",
      "position": 0
    },
    {
      "token": "是",
      "start_offset": 3,
      "end_offset": 4,
      "type": "CN_CHAR",
      "position": 1
    },
    {
      "token": "世界上",
      "start_offset": 4,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 2
    },
    {
      "token": "最好",
      "start_offset": 7,
      "end_offset": 9,
      "type": "CN_WORD",
      "position": 3
    },
    {
      "token": "的",
      "start_offset": 9,
      "end_offset": 10,
      "type": "CN_CHAR",
      "position": 4
    },
    {
      "token": "语言",
      "start_offset": 10,
      "end_offset": 12,
      "type": "CN_WORD",
      "position": 5
    }
  ]
}

ik_max_word 分词器分词效果

json

{
  "tokens": [
    {
      "token": "php",
      "start_offset": 0,
      "end_offset": 3,
      "type": "ENGLISH",
      "position": 0
    },
    {
      "token": "是",
      "start_offset": 3,
      "end_offset": 4,
      "type": "CN_CHAR",
      "position": 1
    },
    {
      "token": "世界上",
      "start_offset": 4,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 2
    },
    {
      "token": "世界",
      "start_offset": 4,
      "end_offset": 6,
      "type": "CN_WORD",
      "position": 3
    },
    {
      "token": "上",
      "start_offset": 6,
      "end_offset": 7,
      "type": "CN_CHAR",
      "position": 4
    },
    {
      "token": "最好",
      "start_offset": 7,
      "end_offset": 9,
      "type": "CN_WORD",
      "position": 5
    },
    {
      "token": "的",
      "start_offset": 9,
      "end_offset": 10,
      "type": "CN_CHAR",
      "position": 6
    },
    {
      "token": "语言",
      "start_offset": 10,
      "end_offset": 12,
      "type": "CN_WORD",
      "position": 7
    }
  ]
}

standard 标准分词器分词效果

json

{
  "tokens": [
    {
      "token": "php",
      "start_offset": 0,
      "end_offset": 3,
      "type": "<alphanum>",
      "position": 0
    },
    {
      "token": "是",
      "start_offset": 3,
      "end_offset": 4,
      "type": "<ideographic>",
      "position": 1
    },
    {
      "token": "世",
      "start_offset": 4,
      "end_offset": 5,
      "type": "<ideographic>",
      "position": 2
    },
    {
      "token": "界",
      "start_offset": 5,
      "end_offset": 6,
      "type": "<ideographic>",
      "position": 3
    },
    {
      "token": "上",
      "start_offset": 6,
      "end_offset": 7,
      "type": "<ideographic>",
      "position": 4
    },
    {
      "token": "最",
      "start_offset": 7,
      "end_offset": 8,
      "type": "<ideographic>",
      "position": 5
    },
    {
      "token": "好",
      "start_offset": 8,
      "end_offset": 9,
      "type": "<ideographic>",
      "position": 6
    },
    {
      "token": "的",
      "start_offset": 9,
      "end_offset": 10,
      "type": "<ideographic>",
      "position": 7
    },
    {
      "token": "语",
      "start_offset": 10,
      "end_offset": 11,
      "type": "<ideographic>",
      "position": 8
    },
    {
      "token": "言",
      "start_offset": 11,
      "end_offset": 12,
      "type": "<ideographic>",
      "position": 9
    }
  ]
}

三种分词器的分词效果对比：

分词器	分析原则
ik_smart	中文文本分词器，对文本做最粗粒度的拆分
ik_max_word	中文分词器，对文本做最粗力度的拆分
standard	默认的标准分词器，使用Unicode 联盟定义的单词边界拆分文本

数据映射

映射可以简单的理解为不同 type 下，数据字段类型的集合。

elasticsearch 支持的域类型

名称	域类型
字符串	string
整数	byte、short、integer、long
浮点数	float、double
布尔类型	boolean
日期	date

json type	域 type
布尔类型	boolean
证书	long
浮点数	double
字符串型的日期	date
字符串类型	text

查看字段映射

GET /blog/_mapping/posts

返回的映射信息

json

{
  "blog": {
    "mappings": {
      "posts": {
        "properties": {
          "__soft_deleted": {
            "type": "integer"
          },
          "content": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "id": {
            "type": "long"
          },
          "is_valid": {
            "type": "long"
          },
          "text": {
            "type": "text",
            "fields": {
              "title": {
                "type": "text"
              }
            }
          },
          "title": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "user_id": {
            "type": "long"
          }
        }
      }
    }
  }
}

自定义域映射

json

PUT /blog_test
{
  "mappings": {
    "doc":{
      "properties":{
        "title":{
          "type":"text",
          "analyzer":"ik_smart",
          "analyzer_search":"standard"
        }
      }
    }
  }
}

搜索

默认情况下搜素并不会超时，你可以使用 time_out 参数来指定查询的超时时间以保证查询的响应速度。设置了超时时间以后，搜索引擎会尽尽可能多的返回在超时时间内查询的数据，需要注意的是，超时时间过后搜索引擎只是返回了查询的数据，查询并不会立即终止。

多索引、多类型查询

URL	说明
/_search	搜索所有的索引和类型
/blog/_search	搜索索引 blog 中的所有类型
/negacorp,blog/_search	搜索索引 blog 以及 negacorp 中的所有类型
/n,b/_search	搜索所有以 n 或 b 开头的索引中的所有类型
/negacorp/user/_search	搜索索引 gnegacorp 中类型 user 内的所有文档
/negacorp,blog/user,posts/_search	搜索索引 negacorp 和索引 posts 中类型 user 以及类型 posts 内的所有文档

搜索分页

GET /blog/posts/_search?size=5&from=0

注意：由于搜索默认返回的是搜索结果的相关性，也就是每一次搜索必然会经过排序的过程。在分布式的搜索中不要使用大页码的搜索，因为每次的搜索过程必然的经过多个分片。以一个十个分片的索引为例：每次的搜索过程为每个分片先获取自己分片上的前 from+size 条结果，然后再将分片上产生的分片结果传递给搜索节点，节点依据返回的结果再进行排序返回最终的搜索结果返回响应给用户，以确保结果的正确性。

elasticsearch 普通查询的原理默认情况下在索引一个文档的时候，elasticsearch 会将所有待索引的字段汇总到一个大的字符串中，并将其索引成一个独立的字段 _all。一般情况下未指明索引字段的时候搜索的都是 _all 字段。

设计模式

框架

Swoole

Laravel

Reset Api

数据查询

分片的相关操作

分词器

分词句柄

ik_smart 分词器分词效果

ik_max_word 分词器分词效果

standard 标准分词器分词效果

数据映射

Swoole

Laravel

Reset Api ​

数据查询 ​

分片的相关操作 ​

分词器 ​

分词句柄 ​

ik_smart 分词器分词效果 ​

ik_max_word 分词器分词效果 ​

standard 标准分词器分词效果 ​

数据映射 ​

Reset Api

数据查询

分片的相关操作

分词器

分词句柄

ik_smart 分词器分词效果

ik_max_word 分词器分词效果

standard 标准分词器分词效果

数据映射