ElasticSearch群集“主服务器未发现异常”

xggvc2p6 于 2023-02-07 发布在 ElasticSearch

关注(0)|答案(9)|浏览(228)

我已经安装了Elasticsearch 2.2.3并在2个节点的群集中进行了配置
节点1（ElasticSearch.yml）

cluster.name: my-cluster
node.name: node1
bootstrap.mlockall: true
discovery.zen.ping.unicast.hosts: ["ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com", "ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com"]
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.multicast.enabled: false
indices.fielddata.cache.size: "30%"
indices.cache.filter.size: "30%"
node.master: true
node.data: true
http.cors.enabled: true
script.inline: false
script.indexed: false
network.bind_host: 0.0.0.0

节点2（ElasticSearch.yml）

cluster.name: my-cluster
node.name: node2
bootstrap.mlockall: true
discovery.zen.ping.unicast.hosts: ["ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com", "ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com"]
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.multicast.enabled: false
indices.fielddata.cache.size: "30%"
indices.cache.filter.size: "30%"
node.master: false
node.data: true
http.cors.enabled: true
script.inline: false
script.indexed: false
network.bind_host: 0.0.0.0

如果我得到curl -XGET 'http://localhost:9200/_cluster/state?pretty'，我有：

{
  "error" : {
    "root_cause" : [ {
      "type" : "master_not_discovered_exception",
      "reason" : null
    } ],
    "type" : "master_not_discovered_exception",
    "reason" : null
  },
  "status" : 503
}

节点1的日志中包含：

[2016-06-22 13:33:56,167][INFO ][cluster.service          ] [node1] new_master {node1}{Vwj4gI3STr6saeTxKkSqEw}{127.0.0.1}{127.0.0.1:9300}{master=true}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2016-06-22 13:33:56,210][INFO ][http                     ] [node1] publish_address {127.0.0.1:9200}, bound_addresses {[::]:9200}
[2016-06-22 13:33:56,210][INFO ][node                     ] [node1] started
[2016-06-22 13:33:56,221][INFO ][gateway                  ] [-node1] recovered [0] indices into cluster_state

改为进入节点2的日志：

[2016-06-22 13:34:38,419][INFO ][discovery.zen            ] [node2] failed to send join request to master [{node1}{Vwj4gI3STr6saeTxKkSqEw}{127.0.0.1}{127.0.0.1:9300}{master=true}], reason [RemoteTransportException[[node2][127.0.0.1:9300][internal:discovery/zen/join]]; nested: IllegalStateException[Node [{node2}{_YUbBNx9RUuw854PKFe1CA}{127.0.0.1}{127.0.0.1:9300}{master=false}] not master for join request]; ]

错误在哪里？

elasticsearch

来源：https://stackoverflow.com/questions/37970187/elasticsearch-cluster-master-not-discovered-exception

9条答案

按热度按时间

eoxn13cs1#

我使用了AWS EC2示例和Centos7。
我的问题是没有IP路由。我不得不用下面的说明打开一些防火墙端口，这样就解决了问题。

sudo firewall-cmd --permanent --add-port=8080/tcp
sudo firewall-cmd --permanent --add-port=9200/tcp
sudo firewall-cmd --permanent --add-port=9300/tcp

赞(0）回复(0）举报 2023-02-07

m0rkklqb2#

我用这句话下定决心：
network.publish_host: ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com
每个elasticsearch.yml配置文件都必须包含此行和您的主机名

赞(0）回复(0）举报 2023-02-07

a7qyws3x3#

master not discovered异常的根本原因是节点无法在端口9300上相互ping通。这需要是双向的。即，node 1应该能够ping通9300上的node 2，反之亦然。
注：Elasticsearch保留端口9300-9400用于集群通信，端口9200-9300用于访问Elasticsearch API。
一个简单的telnet就可以确认。从node 1启动telnet node2 9300。
如果成功，接下来从node 2尝试telnet node1 9300。
在master not discovered异常的情况下，上述telnet中至少有一个会失败。
如果您没有安装telnet，您甚至可以使用curl。
希望这个有用。

赞(0）回复(0）举报 2023-02-07

t0ybt7op4#

这可能是未发现主节点的原因。如果EC2示例位于同一VPC下，请在**/etc/elasticsearch/elasticsearch.yml中提供专用IP，如下所示：
cluster.initial_master_nodes: ["<PRIVATE-IP"]
注：更改上述配置后，请重新启动ElasticSearch服务，如sudo service elasticsearch stop和sudo service elasticsearch stop**（如果操作系统为ubuntu）。

赞(0）回复(0）举报 2023-02-07

eqzww0vc5#

如果您使用ElasticSearch7
更新/etc/elasticsearch处的elasticsearch.yml文件：

node.name: "node-1" 

network.host: ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com

http.port: 9200

cluster.initial_master_nodes: ["node-1"]

这里node.name和cluster.initial_master_nodes第一值应该相同

赞(0）回复(0）举报 2023-02-07

dnph8jn46#

这里有很多你不想要（比如fielddata）或者不需要的设置。而且，你显然使用的是AWS EC2示例，所以你应该使用cloud-aws plugin（在ES 5.x中分解成单独的插件）。这将提供一个新的发现模型，你可以利用它来代替zen。
对于每个节点，您需要安装cloud-aws插件（假设ES 2.x）：

$ bin/plugin install cloud-aws

在每个节点上安装之后，您就可以使用它来利用discovery-ec2组件：

# Guarantee that the plugin is installed
plugin.mandatory: cloud-aws

# Discovery / AWS EC2 Settings
discovery
  type: ec2
  ec2:
    availability_zones: [ "us-east-1a", "us-east-1b" ]
    groups: [ "my_security_group1", "my_security_group2" ]

cloud:
  aws
    access_key: AKVAIQBF2RECL7FJWGJQ
    secret_key: vExyMThREXeRMm/b/LRzEB8jWwvzQeXgjqMX+6br
    region: us-east-1
  node.auto_attributes: true

# Bind to the network on whatever IP you want to allow connections on.
# You _should_ only want to allow connections from within the network
# so you only need to bind to the private IP
node.host: _ec2:privateIp_

# You can bind to all hosts that are possible to communicate with the
# node but advertise it to other nodes via the private IP (less
# relevant because of the type of discovery used, but not a bad idea).
#node:
#  bind_host: [ _ec2:privateIp_, _ec2:publicIp_, _ec2:publicDns_ ]
#  publish_host: _ec2:privateIp_

# Node-specific settings (note: nodes default to be master and data nodes)
node:
  name: node1
  master: true
  data: true

# Constant settings
cluster.name: my-cluster
bootstrap.mlockall: true

最后，您的问题是由于某种原因导致主节点选举失败，而这种原因很可能是由于连接问题。上述配置应该可以解决这些问题，但您还有一个关键问题：您指定的discovery.zen.minimum_master_nodes设置不正确。您有两个 * 合格的 * 主节点，但您要求Elasticsearch在任何选举中 * 仅 * 要求一个。这意味着，在隔离状态下，每个合格的主节点都可以决定它们具有仲裁，因此可以单独选举自己（从而提供两个主节点，实际上是两个群集）。这是错误的。
因此，您必须始终使用quorum设置该设置：(M / 2) + 1，向下舍入，其中M是主 * 合格 * 节点的数量。

M = 2
(2 / 2) + 1 = (1) + 1 = 2

如果有3、4或5个符合主节点条件的节点，则为：

M = 3
(3 / 2) + 1 = (1.5) + 1 = 2.5 => 2

M = 4
(4 / 2) + 1 = (2) + 1 = 3

M = 5
(5 / 2) + 1 = (2.5) + 1 = 3.5 => 3

因此，在您的情况下，还应设置：

discovery.zen.minimum_master_nodes: 2

注意，您可以将其添加为另一行，或者您可以从上面修改发现块（这实际上归结为YAML的风格）：

discovery
  type: ec2
  ec2:
    availability_zones: [ "us-east-1a", "us-east-1b" ]
    groups: [ "my_security_group1", "my_security_group2" ]
  zen.minimum_master_nodes: 2

赞(0）回复(0）举报 2023-02-07

3lxsmp7m7#

在我的系统防火墙上，这就是为什么我得到了同样的错误，当我关闭防火墙，然后一切正常。所以请确保您的防火墙是关闭的。

赞(0）回复(0）举报 2023-02-07

n7taea2i8#

如果master以老版本的弹性索引启动，而worker以空索引启动，并使用新版本对其进行初始化，则也会出现此错误

赞(0）回复(0）举报 2023-02-07

hjzp0vay9#

Sandeep上面的回答向我暗示了节点无法相互通信。当我深入研究这个问题时，我发现我缺少TCP的入站规则，EC2的安全组中的端口9300。添加规则并在所有节点上重新启动elasticsearch服务，它开始工作。

赞(0）回复(0）举报 2023-02-07