Press "Enter" to skip to content

ElasticSearch 7.x 解决 TooManyBucketsException 问题

ElasticSearch 7.x 版本出现如下提示: Caused by: org.elasticsearch.search.aggregations.MultiBucketConsumerService$TooManyBucketsException: Trying to create too many buckets. Must be less than or equal to: [10000] but was [10314]. This limit can be set by changing the [search.max_buckets] cluster level setting.

分析: 这是6.x以后版本的特性, 目的是限制大批量聚合操作, 规避性能风险.

解决方法1: 增加ElasticSearch的search.max_buckets限制

curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d '{"persistent": { "search.max_buckets": 50000 }}'

解决方法2: 在时间间隔/文档数量上面加一些限制, 缩减buckets的数量

To minimize these either add change min time interval on datasource or panel level or either add min doc count on date histogram to 1.

原因分析

Elasticsearch官网关于buckets的解释如下:

the buckets effectively define document sets. In addition to the buckets themselves, the bucket aggregations also compute and return the number of documents that “fell into” each bucket.

简单说, bucket就是文档的数据集合. 我的理解是, 查询的结果集里, 有多少种不同类型的数据集, 就有多少个bucket. 下面我结合一个实例, 说明一下我理解的bucket是什么(可能我的理解不一定正确, 欢迎指正).

假设我有以下index. type的类型只可能有以下5种: query[A], query[AAAA], forwarded, reply, cached

@timestamp type
Jun 23, 2020 @ 19:32:45.000 query[AAAA]
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 cached
Jun 23, 2020 @ 19:32:45.000 cached
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 query[A]
Jun 23, 2020 @ 19:32:45.000 cached
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 cached
Jun 23, 2020 @ 19:32:45.000 config
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 forwarded
Jun 23, 2020 @ 19:32:45.000 cached
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 reply
Jun 23, 2020 @ 19:32:45.000 cached

假设查询时间为过去15分钟

如果设置”Min time interval”为1s, 则一共有15*(60/1)=900个时间段, 而在每一个时间段里, 一共有5种不同的bucket, 这样会导致产生900*5=4500个bucket

如果设置”Min time interval”为30s, 则一共有15*(60/30)=30个时间段, 而在每一个时间段里, 一共有5种不同的bucket, 这样会导致产生30*5=150个bucket

这样也就很好的解释了, 为什么加大Min time interval的值, 可以解决这个问题.

参考文档:
Increasing max_buckets for specific Visualizations
ElasticSearch search_phase_execution_exception
ElasticSearch 7.x too_many_buckets_exception #17327

Leave a Reply

Your email address will not be published. Required fields are marked *