Press "Enter" to skip to content

分类: Linux运维

Linux,unix,mysql,database,oracle,mysql

调用Elasticsearch API查询数据, 并将数据导出为csv格式

1, 调用Elasticsearch API查询数据

下面的命令是一条标准的查询语句

curl -XGET http://127.0.0.1:9200/my_index-*/_search -H 'Content-Type: application/json' -d'
{
  "size": 10000,
  "query": {
    "bool": {
      "filter": [
        { "match_all": {} },
        { "match_phrase": { "id": 20202162488675 } },
        { "match_phrase": { "my_site": "www.zhukun.net" } },
        { "match_phrase": { "log_level": "error" } },
        { "range": { "@timestamp": { "gte": "2022-06-09T00:00:00.000+08:00", "lte": "2022-06-09T23:59:59.999+08:00" } } }
      ]
    }
  }
}'

2, 将查询到的数据导出为csv格式

首先, 我们将上面的命令查询的结果重定向到a.json文件中, 然后使用jq命令将json中的数据导出为csv

jq '[ .hits.hits[]._source ] | map(del(.message)) | (map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | @csv' a.json > a.csv

3, 一些注意事项

3.1 查询命令里的”size”: 10000表示查询结果的最大显示数量, 受到ES的index.max_result_window的限制(默认限制是10000条), 如果要修改此限制可使用如下命令:

curl -XPUT http://127.0.0.1:9200/my_index-*/_settings -H 'Content-Type: application/json' -d'
{
  "index": { "max_result_window": 50000 }
}'

3.2 关于jq命令的用法

  • [ .hits.hits[]._source ]表示将所有结果放在一个list里
  • map(del(.message))表示删除结果里的message这个k-v
  • map(keys) | add | unique表示提取所有的key
Leave a Comment

内网穿透: 使用ssh tunnel将内网主机映射到公网

如果希望将一台内网中的主机发布到公网(使用阿里云/腾讯云中转的方式), 使得该内网主机可以在全球任意地点被访问, 仅需要用到ssh即可. 用到的原理就是ssh的remote port forwarding特性, 具体可参考本博客之前写的简单解释 ssh 中的 local port forwarding 和 remote port forwarding.

假设我们已经有了一台阿里云/腾讯云的主机, 其公网IP是1.1.1.1, 需要在ssh配置里启用GatewayPorts(否则ssh tunnel建立以后只会监听127.0.0.1)

在我们的内网主机上写入一个systemd服务

sudo vim /etc/systemd/system/[email protected]    # 写入如下内容
[Unit]
Description=Persistent SSH Tunnel to from port 127.0.0.1:%i on this server to port 3389 on external server(1.1.1.1:3389)
After=network.target
 
[Service]
Environment="LOCAL_PORT=%i"
ExecStart=/usr/bin/ssh -NTC -o ServerAliveInterval=60 -o ExitOnForwardFailure=yes -R 3389:0.0.0.0:${LOCAL_PORT} [email protected]
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

参数解释:
T: 禁止分配伪终端
N: 不执行远程指令
C: 请求压缩所有数据

然后启用服务

sudo systemctl daemon-reload
sudo systemctl enable [email protected] --now

然后就可以通过1.1.1.1:3389来访问这台内网主机的22端口了.

参考文档: README-setup-tunnel-as-systemd-service.md

Leave a Comment

CentOS 7解决arp欺骗

某天发现一台CentOS机器网络一切正常, 但是却无法正常上网. 后经查询, 发现该机器的网关mac地址与实际网关的mac地址不符.  下面是查看和解决办法.

查看arp

$ cat /proc/net/arp 
IP address       HW type     Flags       HW address            Mask     Device
192.168.43.62    0x1         0x2         24:6e:96:93:c9:7d     *        eth0
192.168.43.61    0x1         0x2         24:6e:96:8c:e0:65     *        eth0
192.168.43.154   0x1         0x0         00:00:00:00:00:00     *        eth0
192.168.43.153   0x1         0x2         52:54:00:fa:bb:fc     *        eth0
192.168.43.60    0x1         0x0         00:00:00:00:00:00     *        eth0
192.168.43.59    0x1         0x0         00:00:00:00:00:00     *        eth0
192.168.43.151   0x1         0x2         52:54:00:aa:73:e2     *        eth0
192.168.43.11    0x1         0x0         00:00:00:00:00:00     *        eth0
192.168.43.31    0x1         0x2         00:be:75:c7:47:ea     *        eth0
192.168.43.111   0x1         0x0         00:00:00:00:00:00     *        eth0
192.168.43.27    0x1         0x2         52:54:00:b4:3f:a3     *        eth0
192.168.43.224   0x1         0x0         00:00:00:00:00:00     *        eth0
192.168.43.26    0x1         0x2         52:54:00:33:50:7e     *        eth0
192.168.43.223   0x1         0x0         00:00:00:00:00:00     *        eth0
192.168.43.25    0x1         0x2         52:54:00:b4:3f:a3     *        eth0
192.168.43.1     0x1         0x2         3c:f5:cc:91:79:87     *        eth0
192.168.43.163   0x1         0x2         52:54:00:78:13:7d     *        eth0
192.168.43.162   0x1         0x2         52:54:00:7f:83:a4     *        eth0
192.168.43.104   0x1         0x2         52:54:00:34:0c:fc     *        eth0
192.168.43.21    0x1         0x0         00:00:00:00:00:00     *        eth0
192.168.43.44    0x1         0x2         24:6e:96:93:a3:c4     *        eth0
192.168.43.43    0x1         0x2         24:6e:96:8c:df:64     *        eth0
192.168.43.158   0x1         0x2         52:54:00:9a:ff:9f     *        eth0
192.168.43.122   0x1         0x2         52:54:00:99:e5:5e     *        eth0


$ arp -a
? (192.168.43.62) at 24:6e:96:93:c9:7d [ether] on eth0
? (192.168.43.61) at 24:6e:96:8c:e0:65 [ether] on eth0
? (192.168.43.154) at <incomplete> on eth0
? (192.168.43.153) at 52:54:00:fa:bb:fc [ether] on eth0
? (192.168.43.60) at <incomplete> on eth0
? (192.168.43.59) at <incomplete> on eth0
? (192.168.43.151) at 52:54:00:aa:73:e2 [ether] on eth0
? (192.168.43.11) at <incomplete> on eth0
? (192.168.43.31) at 00:be:75:c7:47:ea [ether] on eth0
? (192.168.43.111) at <incomplete> on eth0
? (192.168.43.27) at 52:54:00:b4:3f:a3 [ether] on eth0
? (192.168.43.224) at <incomplete> on eth0
? (192.168.43.26) at 52:54:00:33:50:7e [ether] on eth0
? (192.168.43.223) at <incomplete> on eth0
? (192.168.43.25) at 52:54:00:b4:3f:a3 [ether] on eth0
gateway (192.168.43.1) at 3c:f5:cc:91:79:87 [ether] on eth0
? (192.168.43.163) at 52:54:00:78:13:7d [ether] on eth0
? (192.168.43.162) at 52:54:00:7f:83:a4 [ether] on eth0
? (192.168.43.104) at 52:54:00:34:0c:fc [ether] on eth0
? (192.168.43.21) at <incomplete> on eth0
? (192.168.43.44) at 24:6e:96:93:a3:c4 [ether] on eth0
? (192.168.43.43) at 24:6e:96:8c:df:64 [ether] on eth0
? (192.168.43.158) at 52:54:00:9a:ff:9f [ether] on eth0

绑定arp

绑定arp的过程在某些国外网站称之为Create a Static ARP Table. 下面演示手动绑定网关192.168.43.1的mac地址为74:ea:c8:2d:9f:f6

arp -s 192.168.43.1 74:ea:c8:2d:9f:f6

 

Leave a Comment

remove a node from ElasticSearch cluster

1, stop shard allocation for this node

$ curl -XGET "127.0.0.1:9200/_cat/allocation?v"
shards disk.indices disk.used disk.avail disk.total disk.percent host         ip           node
   412      960.3gb     1.8tb     15.6tb     17.4tb           10 172.29.4.156 172.29.4.156 es_node_156_2
   411      478.9gb     1.5tb     15.9tb     17.4tb            8 172.29.4.158 172.29.4.158 es_node_158_2
   411      557.5gb   558.7gb     16.9tb     17.4tb            3 172.29.4.157 172.29.4.157 es_node_157
   411      743.5gb     1.5tb     15.9tb     17.4tb            8 172.29.4.158 172.29.4.158 es_node_158
   411          1tb       1tb      9.9tb     10.9tb            9 172.29.4.177 172.29.4.177 es_node_177
   411      840.6gb     1.8tb     15.6tb     17.4tb           10 172.29.4.156 172.29.4.156 es_node_156
   248        9.3tb     9.3tb      1.5tb     10.9tb           85 172.29.4.178 172.29.4.178 es_node_178

假设我们希望下掉es_node_158_2这个节点, 则下面3条命令任选其一

curl -XPUT 127.0.0.1:9200/_cluster/settings -H 'Content-Type: application/json' -d '{
  "transient" :{
    "cluster.routing.allocation.exclude._ip": "<node_ip_address>"
  }
}'


curl -XPUT 127.0.0.1:9200/_cluster/settings -H 'Content-Type: application/json' -d '{
  "transient" :{
    "cluster.routing.allocation.exclude._name": "es_node_158_2"
  }
}'


curl -XPUT 127.0.0.1:9200/_cluster/settings -H 'Content-Type: application/json' -d '{
  "transient" :{
    "cluster.routing.allocation.exclude._id": "<node_id>"
  }
}'

确认上面的命令执行成功

curl -XGET "127.0.0.1:9200/_cluster/settings?pretty=true"
{
  "persistent" : {
    "cluster" : {
      "max_shards_per_node" : "30000"
    },
    "indices" : {
      "breaker" : {
        "fielddata" : {
          "limit" : "20%"
        }
      }
    },
    "search" : {
      "max_buckets" : "87000"
    },
    "xpack" : {
      "monitoring" : {
        "collection" : {
          "enabled" : "true"
        }
      }
    }
  },
  "transient" : {
    "cluster" : {
      "routing" : {
        "allocation" : {
          "enable" : "all",
          "exclude" : {
            "_name" : "es_node_158_2"
          }
        }
      }
    }
  }
}

然后Elasticsearch会将es_node_158_2节点上的shards分配给其余节点. 再次查看shards allocation情况会发现es_node_158_2上面的shards数量在明显减少.

$ curl -XGET "127.0.0.1:9200/_cat/allocation?v"
shards disk.indices disk.used disk.avail disk.total disk.percent host         ip           node
   248        9.3tb     9.3tb      1.5tb     10.9tb           85 172.29.4.178 172.29.4.178 es_node_178
   438          1tb       1tb      9.9tb     10.9tb            9 172.29.4.177 172.29.4.177 es_node_177
   417      559.9gb   561.1gb     16.9tb     17.4tb            3 172.29.4.157 172.29.4.157 es_node_157
   441      963.1gb     1.8tb     15.6tb     17.4tb           10 172.29.4.156 172.29.4.156 es_node_156_2
   443      842.5gb     1.8tb     15.6tb     17.4tb           10 172.29.4.156 172.29.4.156 es_node_156
   443      747.1gb     1.5tb     15.9tb     17.4tb            8 172.29.4.158 172.29.4.158 es_node_158
   285      472.7gb     1.5tb     15.9tb     17.4tb            8 172.29.4.158 172.29.4.158 es_node_158_2  # shards开始减少

2, stop node and afterwork

等es_node_158_2上面的shards数量变为0的时候, 就可以登陆es_node_158_2并shutdown elasticsearch service了.

在es_node_158_2上面执行

$ systemctl stop elasticsearch
$ systemctl disable elasticsearch

在其它node上面执行

$ curl -XPUT 127.0.0.1:9200/_cluster/settings -H 'Content-Type: application/json' -d '{
  "transient" :{
    "cluster.routing.allocation.exclude._name": null
  }
}'

参考文档: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cluster.html#cluster-shard-allocation-filtering

Leave a Comment

使用python判断IP段可用IP及数量

使用python判断IP段可用IP及数量, 很简单.几个命令就可以(本文基于python3).

>>> import ipaddress
>>> bool(ipaddress.ip_address('172.21.97.12') in ipaddress.ip_network('172.16.0.0/12'))
True
>>>
>>> for ip in ipaddress.ip_network('192.168.0.0/28'):
...     print(ip)
...
192.168.0.0
192.168.0.1
192.168.0.2
192.168.0.3
192.168.0.4
192.168.0.5
192.168.0.6
192.168.0.7
192.168.0.8
192.168.0.9
192.168.0.10
192.168.0.11
192.168.0.12
192.168.0.13
192.168.0.14
192.168.0.15
>>>
>>> ipaddress.ip_network('192.168.0.0/28').num_addresses
16

批量计算

$ cat 2
172.16.128.0/18
172.16.32.0/20
172.16.64.0/19
172.19.192.0/19
172.16.240.0/21
172.16.48.0/20
172.16.192.0/19
172.19.160.0/19
172.19.64.0/18
172.16.24.0/21
172.16.96.0/19
172.19.128.0/19


$ python3
>>> import ipaddress
>>> with open("./2", "r") as f:
...     for i in f.readlines():
...         print(ipaddress.ip_network(i.rstrip()).num_addresses)
...
16384
4096
8192
8192
2048
4096
8192
8192
16384
2048
8192
8192

 

3 Comments

ElasticSearch DSL聚合查询语句

本来像聚合(aggregation)这种东西, 在Grafana中可以轻易的实现, 但是偶尔会有需求, 需要自己写DSL脚本实现一些功能, 于是, 只好自己动手了.

例子1

查询serverName=”dns-server-1″结果里, 按hostip的数量进行排序, 取前5

GET /my-service-2020.07.22/_search
{
  "query": {
    "term": { "serverName.keyword": "dns-server-1" }
  },
  "size" : 0,
  "aggs": {
    "top-10-hostip": {
      "terms": {
      	"field": "hostip.keyword",
        "size": 5
      }
    }
  }
}

结果

Leave a Comment

ElasticSearch索引/数据定期清理

关于定期清理ElasticSearch索引, 最简单粗暴的方法是写一个shell脚本, 实现定理删除INDEX. 但其实ElasticSearch官网也提供了一些工具来做这些事, 比如下面2个方法.

1, Crontab方式

$ crontab -l
00 05 * * * /usr/bin/curl -XDELETE localhost:9200/myindex-`date -d "-10days" +\%y\%m\%d`
30 05 * * * /usr/bin/curl -XDELETE localhost:9200/myindex-`date -d "-11days" +\%y\%m\%d`
00 06 * * * /usr/bin/curl -XDELETE localhost:9200/myindex-`date -d "-12days" +\%y\%m\%d`

2, ILM: Manage the index lifecycle

应该是最简单有用的清理INDEX的办法了(官方文档在此, 一个简单的范例在此), 是X-Pack自带的功能, 不需要安装额外工具. ILM的主要功能有

  1. 当index容量达到一定数值(例如50G), 或者其中的日志数量达到一定数值以后, 开启一个新index
  2. 定期把旧index移动到旧的硬件节点上
  3. 指定什么情况下可以修改replicas数量, 或者修改一个index的主分片数量, 或者指定什么情况可以Force merge segments
  4. 定期删除旧index

3, Curator

也是ElasticSearch官方的工具, 需要额外安装(下载地址).  这个工具最早是clearESindices.py演化而来的, 最早的目的就是清理删除Index, 再后来, 随着作者被Elasticsearch公司聘用, 这个工具也被更名为Elasticsearch Curator. 它使用yaml作为基础配置语法, 官网提供了一堆Example配置可以参考.

$ cat /etc/elasticsearch/curator-cfg.yml
client:
  hosts:
    - 172.29.4.158
    - 172.29.4.157
    - 172.29.4.156
  port: 9200
  use_ssl: False
  http_auth: "elastic:MyPassword"
  timeout: 30
logging:
  loglevel: INFO
  logformat: default
  logfile: /var/log/elasticsearch/curator.log
$ cat /etc/elasticsearch/curator-del.yml
actions:
  1:
    action: delete_indices
    description: >-
      Delete old system indexes.
    options:
      ignore_empty_list: True
      timeout_override:
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: .monitoring-kibana-7-
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 3
  2:
    action: delete_indices
    description: >-
      Delete old indexes.
    options:
      ignore_empty_list: True
      timeout_override:
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: pattern
      kind: regex
      value: '^(office_dns_log-|office_dns_log_failover-|mail-|mail_failover-).*$'
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 180

dry-run运行试一下

/usr/bin/curator --config /etc/elasticsearch/curator-cfg.yml --dry-run /etc/elasticsearch/curator-del.yml

然后可以观察下/var/log/elasticsearch/curator.log文件里的提示. 确认没问题后, 将–dry-run去掉并写入crontab即可.

参考文档:
Automatically removing index

Leave a Comment