ElasticSearch 7.x 解决 TooManyBucketsException 问题

ElasticSearch 7.x 版本出现如下提示: Caused by: org.elasticsearch.search.aggregations.MultiBucketConsumerService$TooManyBucketsException: Trying to create too many buckets. Must be less than or equal to: [10000] but was [10314]. This limit can be set by changing the [search.max_buckets] cluster level setting.

分析: 这是6.x以后版本的特性, 目的是限制大批量聚合操作, 规避性能风险.

解决方法1: 增加ElasticSearch的search.max_buckets限制

curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d '{"persistent": { "search.max_buckets": 50000 }}'

解决方法2: 在时间间隔/文档数量上面加一些限制, 缩减buckets的数量

To minimize these either add change min time interval on datasource or panel level or either add min doc count on date histogram to 1.

原因分析

Elasticsearch官网关于buckets的解释如下:

the buckets effectively define document sets. In addition to the buckets themselves, the bucket aggregations also compute and return the number of documents that “fell into” each bucket.

简单说, bucket就是文档的数据集合. 我的理解是, 查询的结果集里, 有多少种不同类型的数据集, 就有多少个bucket. 下面我结合一个实例, 说明一下我理解的bucket是什么(可能我的理解不一定正确, 欢迎指正).

假设我有以下index. type的类型只可能有以下5种: query[A], query[AAAA], forwarded, reply, cached

@timestamp type
Jun 23, 2020 @ 19:32:45.000 query[AAAA]
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 cached
Jun 23, 2020 @ 19:32:45.000 cached
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 query[A]
Jun 23, 2020 @ 19:32:45.000 cached
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 cached
Jun 23, 2020 @ 19:32:45.000 config
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 forwarded
Jun 23, 2020 @ 19:32:45.000 cached
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 cached

假设查询时间为过去15分钟

如果设置”Min time interval”为1s, 则一共有15*(60/1)=900个时间段, 而在每一个时间段里, 一共有5种不同的bucket, 这样会导致产生900*5=4500个bucket

如果设置”Min time interval”为30s, 则一共有15*(60/30)=30个时间段, 而在每一个时间段里, 一共有5种不同的bucket, 这样会导致产生30*5=150个bucket

这样也就很好的解释了, 为什么加大Min time interval的值, 可以解决这个问题.

参考文档:
Increasing max_buckets for specific Visualizations
ElasticSearch search_phase_execution_exception
ElasticSearch 7.x too_many_buckets_exception #17327

CentOS 7 安装配置 kafka

通常来说, logstash的处理能力有限, 为了防止高峰期日志数量太高导致kafka挂掉, 一般使用kafka来缓存日志消息.

kafka依赖zookeeper, 因此需要先安装配置zookeeper, 再安装配置kafka.

系统环境:

系统统一采用CentOS 7.8 64bit

IP地址 Zookeeper安装目录 Zookeeper DATA目录 kafka安装目录 kafka内网调用域名
172.29.4.168 /data/zookeeper /data/zookeeper/data /data/kafka kafka1.zhukun.net
172.29.4.169 /data/zookeeper /data/zookeeper/data /data/kafka kafka2.zhukun.net
172.29.4.170 /data/zookeeper /data/zookeeper/data /data/kafka kafka3.zhukun.net

1, 部署并配置Zookeeper

以下操作需要同时在3台服务器上操作

$ wget http://mirror.bit.edu.cn/apache/zookeeper/stable/apache-zookeeper-3.5.8-bin.tar.gz
$ tar zxvf apache-zookeeper-3.5.8-bin.tar.gz
$ mv apache-zookeeper-3.5.8-bin /data/
$ ln -s /data/apache-zookeeper-3.5.8-bin /data/zookeeper && mkdir /data/zookeeper/data

准备zk配置文件

$ vim /data/zookeeper/conf/zoo.cfg    # 写入如下配置
dataDir=/data/zookeeper/data
clientPort=2181
maxClientCnxns=0
admin.enableServer=false
# admin.serverPort=8080
initLimit=10
syncLimit=5
server.1=172.29.4.168:2888:3888
server.2=172.29.4.169:2888:3888
server.3=172.29.4.170:2888:3888

准备系统服务 Continue reading “CentOS 7 安装配置 kafka”

ElasticSearch 解决 UNASSIGNED SHARDS

ElasticSearch出现UNASSIGNED SHARDS的解决办法

首先可以查看集群里有多少个未分配的分片, 以及分片是否均匀.

$ curl -XGET "172.18.192.100:9200/_cat/allocation?v"
shards disk.indices disk.used disk.avail disk.total disk.percent host           ip             node
  2120        2.8tb     6.2tb     11.6tb     17.9tb           35 172.18.192.101 172.18.192.101 it-elk-node3
  3520        5.8tb     5.9tb       12tb     17.9tb           33 172.18.192.102 172.18.192.102 it-elk-node4
   764          1tb       2tb      9.3tb     11.3tb           17 172.18.192.100 172.18.192.100 it-elk-node2
  1707                                                                                         UNASSIGNED

一般来说, ES会自动将未分配的shards, 分配到各node上. 使用以下命令确定自动分配分片的功能是打开的

$ curl -XGET http://172.18.192.100:9200/_cluster/settings?pretty
{
  "persistent" : {
    "cluster" : {
      "max_shards_per_node" : "20000"    # 一个node可以拥有最大20000个shards
    },
    "xpack" : {
      "monitoring" : {
        "collection" : {
          "enabled" : "true"
        }
      }
    }
  },
  "transient" : {
    "cluster" : {
      "routing" : {
        "allocation" : {
          "enable" : "all"    # 只要cluster.routing.allocation.enable是all的状态, ES就会自动分配shards
        }
      }
    }
  }
}

如果自动分配分片功能没有打开, 使用如下命令打开之 Continue reading “ElasticSearch 解决 UNASSIGNED SHARDS”

ElasticSearch提示too many open files

ElasticSearch提示too many open files, 如何去分析定位?

$ curl -XGET "172.18.192.100:9200/_nodes/stats/process?pretty"
{
  "_nodes" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "cluster_name" : "it-elk",
  "nodes" : {
    "rBm53XWOTk-2v3MHPa2FDA" : {
      "timestamp" : 1589854287039,
      "name" : "it-elk-node3",
      "transport_address" : "172.18.192.101:9300",
      "host" : "172.18.192.101",
      "ip" : "172.18.192.101:9300",
      "roles" : [
        "ingest",
        "master",
        "data"
      ],
      "attributes" : {
        "ml.machine_memory" : "134778376192",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true"
      },
      "process" : {
        "timestamp" : 1589854286789,
        "open_file_descriptors" : 59595,    # 当前打开的文件
        "max_file_descriptors" : 65535,     # 系统允许打开的最大文件
        "cpu" : {
          "percent" : 3,
          "total_in_millis" : 86105320
        },
        "mem" : {
          "total_virtual_in_bytes" : 1669361537024
        }
      }
    }

当然, 也可以从系统层面, 看一下当前限制

$ ps -ef | grep elasticsearch    # 找到进程的PID
elastic+ 128967      1 99 5月18 ?       1-13:22:07 /usr/share/elasticsearch/jdk/bin/java -Xms32g -Xmx32g -XX:+UseConcMarkSweepGC

$ cat /proc/128967/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             4096                 4096                 processes
Max open files            65535                65535                files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       514069               514069               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

参考文档:
https://www.elastic.co/guide/en/elasticsearch/guide/master/_file_descriptors_and_mmap.html
ElasticSearch: Unassigned Shards, how to fix?

Ubuntu使用socat进行端口转发

以前写过一篇使用iptables进行端口转发的文章, 今天写一下使用socat将本地端口的流量转发到远程机上的过程. 不要问我这样做有什么用, 我也不知道.

安装

$ sudo apt install socat

转发TCP端口

$ sudo vim /etc/systemd/system/socat.service    # 写入如下内容
[Unit]
Description=socat (https://www.zhukun.net)
After=network-online.target
Wants=network-online.target

[Service]
User=root
Group=root
ExecStart=/usr/bin/socat TCP4-LISTEN:本地端口,reuseaddr,fork TCP4:远程IP:远程端口
Restart=always
RestartSec=2

[Install]
WantedBy=multi-user.target

转发UDP端口

$ sudo vim /etc/systemd/system/socat_udp.service    # 写入如下内容
[Unit]
Description=socat_udp (https://www.zhukun.net)
After=network-online.target
Wants=network-online.target

[Service]
User=root
Group=root
ExecStart=/usr/bin/socat -T5 UDP4-LISTEN:本地端口,reuseaddr,fork UDP4:远程IP:远程端口
Restart=always
RestartSec=2

[Install]
WantedBy=multi-user.target

启动服务

$ sudo systemctl daemon-reload
$ sudo systemctl start socat.service
$ sudo systemctl start socat_udp.service
$ sudo systemctl enable socat.service
$ sudo systemctl enable socat_udp.service