Elasticsearch那点事

1, 如何知道我的INDEX的数据都存在哪些node?

需要注意的是, 一个INDEX由多个segment组成, 只要知道这些segment在哪个node, 就能知道这个INDEX存在哪些node上面.

curl -X GET "127.0.0.1:9200/_cat/segments/mail-w3svc1-2020.05.31?v"
index                  shard prirep ip           segment generation docs.count docs.deleted     size size.memory committed searchable version compound
mail-w3svc1-2020.05.31 0     p      172.29.4.171 _2q0          3528     369102            0  249.7mb      113048 true      true       8.4.0   true
mail-w3svc1-2020.05.31 0     p      172.29.4.171 _2r7          3571     319454            0  216.6mb       99914 true      true       8.4.0   true
mail-w3svc1-2020.05.31 0     p      172.29.4.171 _2zd          3865     320827            0  219.8mb      105560 true      true       8.4.0   true
mail-w3svc1-2020.05.31 0     p      172.29.4.171 _31k          3944    7768065            0    4.7gb     3059591 true      true       8.4.0   false
mail-w3svc1-2020.05.31 0     p      172.29.4.171 _35i          4086     119684            0   84.9mb       43316 true      true       8.4.0   true
mail-w3svc1-2020.05.31 0     p      172.29.4.171 _37q          4166      53366            0   38.2mb           0 true      false      8.4.0   true
mail-w3svc1-2020.05.31 0     p      172.29.4.171 _39q          4238     242418            0  170.2mb       78499 true      true       8.4.0   true
mail-w3svc1-2020.05.31 0     p      172.29.4.171 _3cc          4332     109124            0   77.3mb       40302 true      true       8.4.0   true
mail-w3svc1-2020.05.31 0     p      172.29.4.171 _3f3          4431     396442            0  272.7mb      132842 true      true       8.4.0   true
mail-w3svc1-2020.05.31 0     p      172.29.4.171 _3fj          4447      26586            0   19.7mb           0 true      false      8.4.0   true
mail-w3svc1-2020.05.31 0     p      172.29.4.171 _3g5          4469      40256            0     29mb           0 true      false      8.4.0   true
mail-w3svc1-2020.05.31 0     p      172.29.4.171 _3gk          4484      48216            0   34.4mb           0 true      false      8.4.0   true
mail-w3svc1-2020.05.31 0     p      172.29.4.171 _3hu          4530      39447            0   27.7mb           0 true      false      8.4.0   true
mail-w3svc1-2020.05.31 0     p      172.29.4.171 _3i2          4538     116472            0   81.3mb       42064 true      true       8.4.0   true
mail-w3svc1-2020.05.31 0     p      172.29.4.171 _3jc          4584      29177            0   21.1mb           0 true      false      8.4.0   true
mail-w3svc1-2020.05.31 0     p      172.29.4.171 _3kt          4637     209783            0  146.1mb       71957 true      true       8.4.0   true
mail-w3svc1-2020.05.31 0     p      172.29.4.171 _3m5          4685      44456            0   32.5mb           0 true      false      8.4.0   true
mail-w3svc1-2020.05.31 0     p      172.29.4.171 _3ny          4750      51744            0     37mb           0 true      false      8.4.0   true
mail-w3svc1-2020.05.31 0     p      172.29.4.171 _3of          4767     112861            0   79.2mb       40737 true      true       8.4.0   true
mail-w3svc1-2020.05.31 0     p      172.29.4.171 _3pc          4800      35665            0   25.8mb           0 true      false      8.4.0   true

引申知识:

1, segment memory是常驻在内存(Jvm heap)里的, 且无法被GC的. 所以需要留意segment memory的大小
2, 也可以按node查看segment memory大小, 命令如下:

curl -X GET "127.0.0.1:9200/_cat/nodes?v&h=name,port,sm"
name           port      sm
it_elk_node169 9300  20.2mb
it_elk_node171 9300 107.3mb
it_elk_node168 9300  61.1mb
it_elk_node170 9300  18.1mb
it_elk_node167 9300  24.4mb

 

Read More

ElasticSearch 解决 429: es_rejected_execution_exception

ElasticSearch日志中出现如下内容:

[INFO ] 2020-06-04 11:04:45.093 [[main]>worker16] elasticsearch - retrying failed action with response code: 429 ({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of processing of [74000580][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[mail-w3svc1-2020.05.31][0]] containing [10] requests, target allocation id: 6OPCe3kOTWqG5k68lRozUA, primary term: 1 on EsThreadPoolExecutor[name = it_elk_node171/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@31aa695f[Running, pool size = 40, active threads = 40, queued tasks = 200, completed tasks = 31488344]]"})

原因分析:

为了防止集群过载,Elasticsearch在设计之初就限制了请求队列的大小, 从而增加了稳定性和可靠性。如果没有限制,客户端可以很容易地通过不良或恶意行为将整个群集关闭。所以在并发查询量大的情况下,访问流量超过了集群中单个Elasticsearch实例的处理能力,Elasticsearch服务端会触发保护性的机制,拒绝执行新的访问,并且抛出EsRejectedExecutionException异常。

这个保护机制与异常触发是由Elasticsearch API实现中的thread pool与配套的queue决定的. 在上面的错误日志中,Elasticsearch为index操作分配的线程池,pool size=40, queue capacity=200,当40个线程处理不过来,并且队列中缓冲的tasks超过200个,那么新的task就会被简单的丢弃掉,并且抛出EsRejectedExecutionException异常。

Elasticsearch将此条日志的级别定义为INFO, 说明它不是一个严重的问题, 根据ElasticSearch官方文档, 有时候甚至不用太关心, 但是依旧可以做一些有效的事来改善它.

定位与解决

根据ElasticSearch官方文档, 要解决429错误, 只有如下途径:

1, 暂停bulk write过程3-5秒钟,  然后重试.

2, 增加thread pool的write线程池相关参数.

thread pool的write线程池相关参数, 可以通过如下命令来获得:

curl -XGET "127.0.0.1:9200/_cat/thread_pool/write?v&h=node_name,ip,name,active,queue,rejected,completed,core,type,pool_size,queue_size"
node_name      ip           name  active queue rejected completed core type  pool_size queue_size
it_elk_node169 172.29.4.169 write      0     0     1174  22801844      fixed        40        200
it_elk_node168 172.29.4.168 write      0     0     1811  32356307      fixed        40        200
it_elk_node167 172.29.4.167 write     30     0    11797  39593375      fixed        40        200
it_elk_node171 172.29.4.171 write      0     0    12304  28194291      fixed        40        200
it_elk_node170 172.29.4.170 write      1     0      743  25107047      fixed        40        200

其中, active和queue表示当前消耗的线程数与队列数, pool_size和queue_size表示系统设置的线程数和队列数.

从ElasticSearch 5开始, 无法再通过cluster API去调整thread pool的设置(官网说明):

Thread pool settings are now node-level settings. As such, it is not possible to update thread pool settings via the cluster settings API.

解决办法

size表示线程数, 不能超过CPU核数.

vim elasticsearch.yml    # 添加如下内容
thread_pool:
    write:
        size: 30
        queue_size: 1000

然后重启ES. 重启之后, 再看, 就会发现pool_size和queue_size的值发生了变化

curl -X GET "127.0.0.1:9200/_cat/thread_pool/write?v&h=node_name,ip,name,active,queue,rejected,completed,core,type,pool_size,queue_size"
node_name      ip           name  active queue rejected completed core type  pool_size queue_size
it_elk_node167 172.29.4.167 write      0     0        0   1305481      fixed        40       3000
it_elk_node169 172.29.4.169 write      0     0        0   4266653      fixed        40        200
it_elk_node168 172.29.4.168 write      0     0     7280   7097169      fixed        40        200
it_elk_node170 172.29.4.170 write      0     0        0   2271912      fixed        40       3000
it_elk_node171 172.29.4.171 write      0     0        0   4377790      fixed        40       3000

同样的, 也可以通过类似的命令, 查看get/ccr(跨集群复制)/search的pool_size/queue_size

# 查看所有node详细的thread_pool
curl -X GET "127.0.0.1:9200/_nodes/thread_pool?pretty=true"


# 查看search的pool_size/queue_size
curl -X GET "127.0.0.1:9200/_cat/thread_pool/search?v&h=node_name,ip,name,active,queue,rejected,completed,core,type,pool_size,queue_size"
node_name      ip           name   active queue rejected completed core type                  pool_size queue_size
it_elk_node167 172.29.4.167 search      0     0        0      8418      fixed_auto_queue_size        61       1000
it_elk_node169 172.29.4.169 search      0     0        0     64923      fixed_auto_queue_size        61       1000
it_elk_node168 172.29.4.168 search      0     0        0    111409      fixed_auto_queue_size        61       1000
it_elk_node170 172.29.4.170 search      0     0        0     50890      fixed_auto_queue_size        61       1000
it_elk_node171 172.29.4.171 search      0     0        0     38820      fixed_auto_queue_size        61       1000

# 查看get的pool_size/queue_size
curl -X GET "127.0.0.1:9200/_cat/thread_pool/get?v&h=node_name,ip,name,active,queue,rejected,completed,core,type,pool_size,queue_size"
node_name      ip           name active queue rejected completed core type  pool_size queue_size
it_elk_node167 172.29.4.167 get       0     0        0       819      fixed        40       1000
it_elk_node169 172.29.4.169 get       0     0        0       331      fixed        40       1000
it_elk_node168 172.29.4.168 get       0     0        0      4756      fixed        40       1000
it_elk_node170 172.29.4.170 get       0     0        0         9      fixed         9       1000
it_elk_node171 172.29.4.171 get       0     0        0      6193      fixed        40       1000


# 查看ccr(跨集群复制)的pool_size/queue_size
curl -X GET "127.0.0.1:9200/_cat/thread_pool/ccr?v&h=node_name,ip,name,active,queue,rejected,completed,core,type,pool_size,queue_size"
node_name      ip           name active queue rejected completed core type  pool_size queue_size
it_elk_node167 172.29.4.167 get       0     0        0       819      fixed        40       1000
it_elk_node169 172.29.4.169 get       0     0        0       331      fixed        40       1000
it_elk_node168 172.29.4.168 get       0     0        0      4756      fixed        40       1000
it_elk_node170 172.29.4.170 get       0     0        0         9      fixed         9       1000
it_elk_node171 172.29.4.171 get       0     0        0      6193      fixed        40       1000


# 查看所有thread pool
curl -XGET "127.0.0.1:9200/_cat/thread_pool?v&h=id,ip,name,queue,rejected,completed"
id                     ip           name                queue rejected completed
VbBadlehSNqlXKtWGlyUvA 172.29.4.156 analyze                 0        0         0
VbBadlehSNqlXKtWGlyUvA 172.29.4.156 ccr                     0        0    140182
VbBadlehSNqlXKtWGlyUvA 172.29.4.156 fetch_shard_started     0        0         0
VbBadlehSNqlXKtWGlyUvA 172.29.4.156 fetch_shard_store       0        0        60
VbBadlehSNqlXKtWGlyUvA 172.29.4.156 flush                   0        0      8700
VbBadlehSNqlXKtWGlyUvA 172.29.4.156 force_merge             0        0         0
VbBadlehSNqlXKtWGlyUvA 172.29.4.156 generic                 0        0   1267594
VbBadlehSNqlXKtWGlyUvA 172.29.4.156 get                     0        0    158496
VbBadlehSNqlXKtWGlyUvA 172.29.4.156 listener                0        0     16736
VbBadlehSNqlXKtWGlyUvA 172.29.4.156 management              0        0   7373132
VbBadlehSNqlXKtWGlyUvA 172.29.4.156 refresh                 0        0  15758014
VbBadlehSNqlXKtWGlyUvA 172.29.4.156 rollup_indexing         0        0         0
VbBadlehSNqlXKtWGlyUvA 172.29.4.156 search                  0        0   1474802
VbBadlehSNqlXKtWGlyUvA 172.29.4.156 search_throttled        0        0         0
VbBadlehSNqlXKtWGlyUvA 172.29.4.156 security-token-key      0        0         0
VbBadlehSNqlXKtWGlyUvA 172.29.4.156 snapshot                0        0         2
VbBadlehSNqlXKtWGlyUvA 172.29.4.156 transform_indexing      0        0         0
VbBadlehSNqlXKtWGlyUvA 172.29.4.156 warmer                  0        0    749996
VbBadlehSNqlXKtWGlyUvA 172.29.4.156 watcher                 0        0     23390
VbBadlehSNqlXKtWGlyUvA 172.29.4.156 write                   0        0  34229769
Py5AN5uKS7OWoDXwdr_ayw 172.29.4.157 analyze                 0        0         0
Py5AN5uKS7OWoDXwdr_ayw 172.29.4.157 ccr                     0        0     70051
Py5AN5uKS7OWoDXwdr_ayw 172.29.4.157 fetch_shard_started     0        0         0
Py5AN5uKS7OWoDXwdr_ayw 172.29.4.157 fetch_shard_store       0        0        34
Py5AN5uKS7OWoDXwdr_ayw 172.29.4.157 flush                   0        0      8912
Py5AN5uKS7OWoDXwdr_ayw 172.29.4.157 force_merge             0        0         0
Py5AN5uKS7OWoDXwdr_ayw 172.29.4.157 generic                 0        0   1688212
Py5AN5uKS7OWoDXwdr_ayw 172.29.4.157 get                     0        0     95389
Py5AN5uKS7OWoDXwdr_ayw 172.29.4.157 listener                0        0     11281
Py5AN5uKS7OWoDXwdr_ayw 172.29.4.157 management              0        0  11715898
Py5AN5uKS7OWoDXwdr_ayw 172.29.4.157 refresh                 0        0  15957993
Py5AN5uKS7OWoDXwdr_ayw 172.29.4.157 rollup_indexing         0        0         0
Py5AN5uKS7OWoDXwdr_ayw 172.29.4.157 search                  0        0    884079
Py5AN5uKS7OWoDXwdr_ayw 172.29.4.157 search_throttled        0        0         0
Py5AN5uKS7OWoDXwdr_ayw 172.29.4.157 security-token-key      0        0         0
Py5AN5uKS7OWoDXwdr_ayw 172.29.4.157 snapshot                0        0         0
Py5AN5uKS7OWoDXwdr_ayw 172.29.4.157 transform_indexing      0        0         0
Py5AN5uKS7OWoDXwdr_ayw 172.29.4.157 warmer                  0        0    482303
Py5AN5uKS7OWoDXwdr_ayw 172.29.4.157 watcher                 0        0     46692
Py5AN5uKS7OWoDXwdr_ayw 172.29.4.157 write                   0        0  50279093
wrKFBPZGQCm9kHxDAe8krw 172.29.4.158 analyze                 0        0         0
wrKFBPZGQCm9kHxDAe8krw 172.29.4.158 ccr                     0        0     70124
wrKFBPZGQCm9kHxDAe8krw 172.29.4.158 fetch_shard_started     0        0         0
wrKFBPZGQCm9kHxDAe8krw 172.29.4.158 fetch_shard_store       0        0        50
wrKFBPZGQCm9kHxDAe8krw 172.29.4.158 flush                   0        0     14722
wrKFBPZGQCm9kHxDAe8krw 172.29.4.158 force_merge             0        0         0
wrKFBPZGQCm9kHxDAe8krw 172.29.4.158 generic                 0        0   1473305
wrKFBPZGQCm9kHxDAe8krw 172.29.4.158 get                     0        0    113200
wrKFBPZGQCm9kHxDAe8krw 172.29.4.158 listener                0        0     12675
wrKFBPZGQCm9kHxDAe8krw 172.29.4.158 management              0        0  27809756
wrKFBPZGQCm9kHxDAe8krw 172.29.4.158 refresh                 0        0  12937481
wrKFBPZGQCm9kHxDAe8krw 172.29.4.158 rollup_indexing         0        0         0
wrKFBPZGQCm9kHxDAe8krw 172.29.4.158 search                  0        0    987998
wrKFBPZGQCm9kHxDAe8krw 172.29.4.158 search_throttled        0        0         0
wrKFBPZGQCm9kHxDAe8krw 172.29.4.158 security-token-key      0        0         0
wrKFBPZGQCm9kHxDAe8krw 172.29.4.158 snapshot                0        0         0
wrKFBPZGQCm9kHxDAe8krw 172.29.4.158 transform_indexing      0        0         0
wrKFBPZGQCm9kHxDAe8krw 172.29.4.158 warmer                  0        0    670607
wrKFBPZGQCm9kHxDAe8krw 172.29.4.158 watcher                 0        0         0
wrKFBPZGQCm9kHxDAe8krw 172.29.4.158 write                   0        0  69187927

 

Read More

ElasticSearch 7.x 解决 TooManyBucketsException 问题

ElasticSearch 7.x 版本出现如下提示: Caused by: org.elasticsearch.search.aggregations.MultiBucketConsumerService$TooManyBucketsException: Trying to create too many buckets. Must be less than or equal to: [10000] but was [10314]. This limit can be set by changing the [search.max_buckets] cluster level setting.

分析: 这是6.x以后版本的特性, 目的是限制大批量聚合操作, 规避性能风险.

解决方法1: 增加ElasticSearch的search.max_buckets限制

curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d '{"persistent": { "search.max_buckets": 50000 }}'

解决方法2: 在时间间隔/文档数量上面加一些限制, 缩减buckets的数量

To minimize these either add change min time interval on datasource or panel level or either add min doc count on date histogram to 1.

原因分析

Elasticsearch官网关于buckets的解释如下:

the buckets effectively define document sets. In addition to the buckets themselves, the bucket aggregations also compute and return the number of documents that “fell into” each bucket.

简单说, bucket就是文档的数据集合. 我的理解是, 查询的结果集里, 有多少种不同类型的数据集, 就有多少个bucket. 下面我结合一个实例, 说明一下我理解的bucket是什么(可能我的理解不一定正确, 欢迎指正).

假设我有以下index. type的类型只可能有以下5种: query[A], query[AAAA], forwarded, reply, cached

@timestamp type
Jun 23, 2020 @ 19:32:45.000 query[AAAA]
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 cached
Jun 23, 2020 @ 19:32:45.000 cached
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 query[A]
Jun 23, 2020 @ 19:32:45.000 cached
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 cached
Jun 23, 2020 @ 19:32:45.000 config
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 forwarded
Jun 23, 2020 @ 19:32:45.000 cached
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 cached

假设查询时间为过去15分钟

如果设置”Min time interval”为1s, 则一共有15*(60/1)=900个时间段, 而在每一个时间段里, 一共有5种不同的bucket, 这样会导致产生900*5=4500个bucket

如果设置”Min time interval”为30s, 则一共有15*(60/30)=30个时间段, 而在每一个时间段里, 一共有5种不同的bucket, 这样会导致产生30*5=150个bucket

这样也就很好的解释了, 为什么加大Min time interval的值, 可以解决这个问题.

参考文档:
Increasing max_buckets for specific Visualizations
ElasticSearch search_phase_execution_exception
ElasticSearch 7.x too_many_buckets_exception #17327

Read More

ElasticSearch 解决 UNASSIGNED SHARDS

ElasticSearch出现UNASSIGNED SHARDS的解决办法

首先可以查看集群里有多少个未分配的分片, 以及分片是否均匀.

$ curl -XGET "172.18.192.100:9200/_cat/allocation?v"
shards disk.indices disk.used disk.avail disk.total disk.percent host           ip             node
  2120        2.8tb     6.2tb     11.6tb     17.9tb           35 172.18.192.101 172.18.192.101 it-elk-node3
  3520        5.8tb     5.9tb       12tb     17.9tb           33 172.18.192.102 172.18.192.102 it-elk-node4
   764          1tb       2tb      9.3tb     11.3tb           17 172.18.192.100 172.18.192.100 it-elk-node2
  1707                                                                                         UNASSIGNED

一般来说, ES会自动将未分配的shards, 分配到各node上. 使用以下命令确定自动分配分片的功能是打开的

$ curl -XGET http://172.18.192.100:9200/_cluster/settings?pretty
{
  "persistent" : {
    "cluster" : {
      "max_shards_per_node" : "20000"    # 一个node可以拥有最大20000个shards
    },
    "xpack" : {
      "monitoring" : {
        "collection" : {
          "enabled" : "true"
        }
      }
    }
  },
  "transient" : {
    "cluster" : {
      "routing" : {
        "allocation" : {
          "enable" : "all"    # 只要cluster.routing.allocation.enable是all的状态, ES就会自动分配shards
        }
      }
    }
  }
}

如果自动分配分片功能没有打开, 使用如下命令打开之 (more…)

Read More

ElasticSearch提示too many open files

ElasticSearch提示too many open files, 如何去分析定位?

$ curl -XGET "172.18.192.100:9200/_nodes/stats/process?pretty"
{
  "_nodes" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "cluster_name" : "it-elk",
  "nodes" : {
    "rBm53XWOTk-2v3MHPa2FDA" : {
      "timestamp" : 1589854287039,
      "name" : "it-elk-node3",
      "transport_address" : "172.18.192.101:9300",
      "host" : "172.18.192.101",
      "ip" : "172.18.192.101:9300",
      "roles" : [
        "ingest",
        "master",
        "data"
      ],
      "attributes" : {
        "ml.machine_memory" : "134778376192",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true"
      },
      "process" : {
        "timestamp" : 1589854286789,
        "open_file_descriptors" : 59595,    # 当前打开的文件
        "max_file_descriptors" : 65535,     # 系统允许打开的最大文件
        "cpu" : {
          "percent" : 3,
          "total_in_millis" : 86105320
        },
        "mem" : {
          "total_virtual_in_bytes" : 1669361537024
        }
      }
    }

当然, 也可以从系统层面, 看一下当前限制

$ ps -ef | grep elasticsearch    # 找到进程的PID
elastic+ 128967      1 99 5月18 ?       1-13:22:07 /usr/share/elasticsearch/jdk/bin/java -Xms32g -Xmx32g -XX:+UseConcMarkSweepGC

$ cat /proc/128967/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             4096                 4096                 processes
Max open files            65535                65535                files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       514069               514069               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

参考文档:
https://www.elastic.co/guide/en/elasticsearch/guide/master/_file_descriptors_and_mmap.html
ElasticSearch: Unassigned Shards, how to fix?

Read More

kibana使用的lucene查询语法

kibana使用的是lucene查询语法, 使用该语法不仅可以在kibana上使用, 也可以在Grafana中使用.

下面简单介绍一下使用方法.

全文搜索

在搜索栏输入login, 会返回所有字段值中包含login的文档
使用双引号包起来作为一个短语搜索

"like Gecko"

字段(Field)

也可以按页面左侧显示的字段搜索

field:value      # 限定字段全文搜索
filed:"value"    # 精确搜索, 关键字加上双引号
http_code:404    # 搜索http状态码为404的文档

字段本身是否存在

_exists_:http_host    # 返回结果中需要有http_host字段
_missing_:http_host   # 不能含有http_host字段

(more…)

Read More

Elasticsearch常用命令

1, Index

curl -X GET 'localhost:9200/_cat/indices?v'                          # 查看所有index

curl -X GET "localhost:9200/_cat/indices/INDEX_PATTERN-*?v&s=index"  # 查看某些名字的index

curl -X GET "localhost:9200/_cat/indices?v&s=docs.count:desc"        # 查看按容量大小排序的index

curl -XDELETE "localhost:9200/INDEX_NAME"                            # 删除某些index

curl -X GET "localhost:9200/INDEX_NAME/_count"                       # 查看某个INDEX的文档数量

curl -XGET "127.0.0.1:9200/_all/_settings?pretty=true "              # 查看所有Index的配置,会列出所有Index,很长...

curl -XGET "127.0.0.1:9200/office_dns_log-*/_settings?pretty=true"   # 查看某些Index的配置

curl -XGET "127.0.0.1:9200/office_dns_log-*/_settings/index.number_*?pretty=true" # 查看某些Index的shards和replicas数量

# 修改索引的replicas数量
curl -XPUT "localhost:9200/INDEX_NAME/_settings?pretty" -H 'Content-Type: application/json' -d' { "number_of_replicas": 2 }'

# 查看INDEX的mapping信息 curl -XGET "127.0.0.1:9200/INDEX_NAME/_mapping?include_type_name=true&pretty=true"

主要参考:
Get index settings API

2, Node

(这里主要参考了这篇文章.)

curl -X GET "localhost:9200/_cat/nodes?v"    # 查看各节点内存使用状态及负载情况
ip             heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.18.192.101           37          80   4    0.41    0.49     0.52 dim       -      it-elk-node3
172.18.192.102           69          90   4    1.19    1.53     1.56 dim       *      it-elk-node4
172.18.192.100           36         100   1    2.91    2.60     2.86 dim       -      it-elk-node2

curl -X GET "localhost:9200/_nodes/stats"
curl -X GET "localhost:9200/_nodes/nodeId1,nodeId2/stats"

# return just indices
curl -X GET "localhost:9200/_nodes/stats/indices"
# return just os and process
curl -X GET "localhost:9200/_nodes/stats/os,process"
# return just process for node with IP address 10.0.0.1
curl -X GET "localhost:9200/_nodes/10.0.0.1/stats/process"

# return just process
curl -X GET "localhost:9200/_nodes/process"
# same as above
curl -X GET "localhost:9200/_nodes/_all/process"
# return just jvm and process of only nodeId1 and nodeId2
curl -X GET "localhost:9200/_nodes/nodeId1,nodeId2/jvm,process"
# same as above
curl -X GET "localhost:9200/_nodes/nodeId1,nodeId2/info/jvm,process"
# return all the information of only nodeId1 and nodeId2
curl -X GET "localhost:9200/_nodes/nodeId1,nodeId2/_all"


# Fielddata summarised by node
curl -X GET "localhost:9200/_nodes/stats/indices/fielddata?fields=field1,field2"
# Fielddata summarised by node and index
curl -X GET "localhost:9200/_nodes/stats/indices/fielddata?level=indices&fields=field1,field2"
# Fielddata summarised by node, index, and shard
curl -X GET "localhost:9200/_nodes/stats/indices/fielddata?level=shards&fields=field1,field2"
# You can use wildcards for field names
curl -X GET "localhost:9200/_nodes/stats/indices/fielddata?fields=field*"

3, segment

# 查看所有INDEX的segment(注意如果INDEX较多, 这个列表可能很长)
curl -u elastic:HMEaQXtLiJaD4zn1ZxzM -X GET "127.0.0.1:9200/_cat/segments?v"

# 查看某个INDEX的segment
curl -u elastic:HMEaQXtLiJaD4zn1ZxzM -X GET "127.0.0.1:9200/_cat/segments/INDEX_PATTERN-*?v"

4, templates相关

template可以定义每个index的设置, 以及每个field的类型, 等等(仅对将来的INDEX有效, 不对现在已有的INDEX有效).

# 查看所有templates
curl -X GET "127.0.0.1:9200/_cat/templates?v&s=name"

# 查看某一个template
curl -X GET "127.0.0.1:9200/_template/template_1?pretty=true"

# 针对某INDEX设定一个template(仅针对未来创建的index有效)
curl -XPUT 127.0.0.1:9200/_template/template_1 -H 'Content-Type: application/json' -d'{
  "index_patterns": ["office_dns*"],
  "settings" : {
    "index.refresh_interval": "30s",
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "index.translog.durability": "request",
    "index.translog.sync_interval": "30s"
  },
  "order" : 1
}'

# 提示:
index.refresh_interval: INDEX的刷新间隔, 默认为1s, 即写入的数据经过多少可以在ES中搜索到.
number_of_shards: 分片的数量(重要), 建议设置为node数量, 如果一个index的容量超过了30G, 会导致查询速度很慢, 此时一定要通过shards数量来分散index
number_of_replicas: 副本数量
index.translog.durability: 将translog数据(包括index/update/delete等)待久化至硬盘的方式,request是系统默认方式.
index.translog.sync_interval:translog提交间隔, 默认是5s

参考文档: https://blog.csdn.net/u014646662/article/details/99293604

5, License相关

# 查看License
curl -XGET 'http://127.0.0.1:9200/_license'

# 删除License
curl -X DELETE "localhost:9200/_license"

# 导入License(本地的License文件为aaa.json), 如果启用了用户名/密码, 这里需要加上用户密码, 例如-u elastic:password
curl -XPUT 'http://127.0.0.1:9200/_xpack/license' -H "Content-Type: application/json" -d @aaa.json

6, shards管理

curl -XGET "127.0.0.1:9200/_cluster/settings?pretty"    # 查看集群最大分片数量
{
  "persistent" : {
    "cluster" : {
      "max_shards_per_node" : "30000"    # 单个节点能容纳30000个shards,默认值是1000
    }
    "xpack" : 
      "monitoring" : 
        "collection" : {
          "enabled" : "true"
        }
      }
    }
  }
}


curl -XGET "127.0.0.1:9200/_cluster/health?pretty"    # 查看当前shards使用数量
{
  "cluster_name" : "it-elk",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 7233,
  "active_shards" : 7248,
  "relocating_shards" : 0,
  "initializing_shards" : 12,
  "unassigned_shards" : 5323,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 2085,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 3167844,
  "active_shards_percent_as_number" : 57.601525868234916        #数据的正常率,100表示一切ok
}

# 查看未分配的分片
curl -XGET "127.0.0.1:9200/_cat/shards?h=index,shard,prirep,state,unassigned.*&pretty"|grep UNASSIGNED | wc -l

# 查看未分配分片, 以及未分配原因
curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED

7, template管理

# 查看所有templates
curl -X GET "127.0.0.1:9200/_cat/templates?v&s=name"

# 查看某一个template
curl -X GET "127.0.0.1:9200/_template/TEMPLATE_NAME?pretty=true"

# 设定一个名为mail的template, 使得以后的mail-w3svc1-*索引具备以下设定
curl -u -XPUT 127.0.0.1:9200/_template/mail -H 'Content-Type: application/json' -d'{
  "index_patterns": ["mail-w3svc1-*"],
  "settings" : {
    "index.refresh_interval": "30s",
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "index.translog.durability": "request",
    "index.translog.sync_interval": "30s"
  },
  "mappings": {
    "properties": {
      "rt": { "type": "integer" },
      "status": { "type": "integer" },
      "width": { "type": "float" }
      "uri": { "type": "text", "fielddata": true },
      "username": { "type": "text", "fielddata": true },
      "server_ip": { "type": "text", "fielddata": true },
      "client_ip": { "type": "text", "fielddata": true }
    }
  },
  "order" : 1
}'

# 提示:
1, index.refresh_interval: INDEX的刷新间隔, 默认为1s, 即写入的数据经过多少可以在ES中搜索到
由于刷新是很耗费资源的行为, 初次导入大量数据时, 可转设置长一点(如30s)等, 后来再改成5s或者10s.
2, number_of_shards: 主分片的数量(重要), 默认为1
如果一个index的容量超过了30G, 会导致查询速度很慢, 此时可以通过shards数量来分散index
3, number_of_replicas: 副本数量, 默认为1.
请注意, 如果一个INDEX的shards为5, 而replicas的话, 会导致这个INDEX一共有5*(1+1)个分片, 会拖慢集群性能.
建议: node数量<=shards数量*(replicas数量+1)
4, index.translog.durability: 将translog数据(包括index/update/delete等操作)待久化至硬盘的方式
5, index.translog.sync_interval: translog提交间隔, 默认是5s
6, mappings.properties: 表示rt/status这几个fields的类型为int类型,并开启username/server_ip等字段的外部脚本的聚合功能
常见字段类型可参考https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html
# 查看上面设定的mail模板
curl -X GET "127.0.0.1:9200/_template/mail?pretty=true"

8, Document

# 写入一条Document(如果index不存在则会自动创建)
curl -XPOST "127.0.0.1:9200/INDEX_NAME/_doc/1" -H "Content-Type: application/json" -d'{"name": "zhu kun"}'
 
# 查看一条Document
curl -XGET "127.0.0.1:9200/INDEX_NAME/_doc/1?pretty=true"
 
# 搜索一条数据
curl -XGET "127.0.0.1:9200/INDEX_NAME/_search?q=name:zhu&pretty=true"

9, 特别注意

一般text类型的field, 仅能在kibana中进行排序和搜索, 如果需要在脚本(如Grafana)中进行聚合(排序,统计,汇总等)操作,则需要设定fielddata为ture, 否则可能会报出如下的错误(参考官网这个文档):

Fielddata is disabled on text fields by default. Set `fielddata=true` on [`your_field_name`] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.

参考文档:
https://www.datadoghq.com/blog/elasticsearch-unassigned-shards/#reason-2-too-many-shards-not-enough-nodes

Read More