我有一个6机集群。这些机器是:
HOST MEM (GB) CPU
mesos-primary-1 8 2
mesos-primary-2 8 2
mesos-primary-3 8 2
mesos-worker-1 1 1
mesos-worker-2 1 1
mesos-worker-3 1 1
我的仲裁大小设置为2。
主机的id分别为:1、2和3。在webui中,我访问了 mesos-primary-1
, mesos-primary-2
以及 mesos-primary-3
在端口5050上,我没有收到来自任何一个ip的重定向。
缺乏重定向使我相信,似乎每台机器都认为自己拥有自己的法定人数或其他东西,这就是为什么他们未能看到对方并选出领导人。
访问港口 8080
在任何一台机器上都会出现一个错误,因为没有选出的领导人,但它确实解决了问题。 $ cat /etc/mesos-master/quorum
在每台主机上输出2。
我也停止/重新启动了一切。在主节点上:
$ sudo service mesos-master stop\
sudo service marathon stop\
sudo service zookeeper stop\
sudo service mesos-master start\
sudo service marathon start\
sudo service zookeeper start
在每台从机上
$ sudo service mesos-slave stop\
sudo service mesos-slave start
仍然没有一个奴隶被发现,也没有一个领导人被选举出来。
我的日志在所有3个IP上都是干净的(我得到了每个IP,因为没有重定向),您可以在这里查看每个IP:
中-初级-1
Log file created at: 2015/10/02 11:00:01
Running on machine: mesos-primary-2
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I1002 11:00:01.532337 13722 logging.cpp:172] INFO level logging started!
I1002 11:00:01.532865 13722 main.cpp:229] Build: 2015-09-25 19:13:24 by root
I1002 11:00:01.532894 13722 main.cpp:231] Version: 0.24.1
I1002 11:00:01.532903 13722 main.cpp:234] Git tag: 0.24.1
I1002 11:00:01.532909 13722 main.cpp:238] Git SHA: 44873806c2bb55da37e9adbece938274d8cd7c48
I1002 11:00:01.533020 13722 main.cpp:252] Using 'HierarchicalDRF' allocator
I1002 11:00:01.546877 13722 leveldb.cpp:176] Opened db in 13.691496ms
I1002 11:00:01.550370 13722 leveldb.cpp:183] Compacted db in 2.522303ms
I1002 11:00:01.550559 13722 leveldb.cpp:198] Created db iterator in 118591ns
I1002 11:00:01.550618 13722 leveldb.cpp:204] Seeked to beginning of db in 1151ns
I1002 11:00:01.550642 13722 leveldb.cpp:273] Iterated through 0 keys in the db in 767ns
I1002 11:00:01.551029 13722 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
I1002 11:00:01.553994 13743 log.cpp:238] Attempting to join replica to ZooKeeper group
I1002 11:00:01.556193 13740 recover.cpp:449] Starting replica recovery
I1002 11:00:01.561755 13722 main.cpp:465] Starting Mesos master
I1002 11:00:01.563489 13740 recover.cpp:475] Replica is in EMPTY status
I1002 11:00:01.568989 13722 master.cpp:378] Master 20151002-110001-2874854303-5050-13722 (159.203.90.171) started on 159.203.90.171:5050
I1002 11:00:01.569059 13722 master.cpp:380] Flags at startup: --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="false" --authenticate_slaves="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname="159.203.90.171" --initialize_driver_logging="true" --ip="159.203.90.171" --log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" --port="5050" --quiet="false" --quorum="2" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" --registry_strict="false" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/var/lib/mesos" --zk="zk://159.203.90.171:2181,104.131.35.19:2181,104.131.117.124:2181/mesos" --zk_session_timeout="10secs"
I1002 11:00:01.569535 13722 master.cpp:427] Master allowing unauthenticated frameworks to register
I1002 11:00:01.569581 13722 master.cpp:432] Master allowing unauthenticated slaves to register
I1002 11:00:01.569608 13722 master.cpp:469] Using default 'crammd5' authenticator
W1002 11:00:01.569718 13722 authenticator.cpp:505] No credentials provided, authentication requests will be refused.
I1002 11:00:01.570199 13722 authenticator.cpp:512] Initializing server SASL
I1002 11:00:01.582969 13722 master.cpp:1464] Successfully attached file '/var/log/mesos/mesos-master.INFO'
I1002 11:00:01.584786 13743 contender.cpp:149] Joining the ZK group
I1002 11:00:11.573873 13747 recover.cpp:111] Unable to finish the recover protocol in 10secs, retrying
I1002 11:01:06.547200 13743 http.cpp:321] HTTP GET for /master/state.json from 173.243.85.102:51963 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36'
中-初级-2
Log file created at: 2015/10/02 11:00:01
Running on machine: mesos-primary-2
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I1002 11:00:01.532337 13722 logging.cpp:172] INFO level logging started!
I1002 11:00:01.532865 13722 main.cpp:229] Build: 2015-09-25 19:13:24 by root
I1002 11:00:01.532894 13722 main.cpp:231] Version: 0.24.1
I1002 11:00:01.532903 13722 main.cpp:234] Git tag: 0.24.1
I1002 11:00:01.532909 13722 main.cpp:238] Git SHA: 44873806c2bb55da37e9adbece938274d8cd7c48
I1002 11:00:01.533020 13722 main.cpp:252] Using 'HierarchicalDRF' allocator
I1002 11:00:01.546877 13722 leveldb.cpp:176] Opened db in 13.691496ms
I1002 11:00:01.550370 13722 leveldb.cpp:183] Compacted db in 2.522303ms
I1002 11:00:01.550559 13722 leveldb.cpp:198] Created db iterator in 118591ns
I1002 11:00:01.550618 13722 leveldb.cpp:204] Seeked to beginning of db in 1151ns
I1002 11:00:01.550642 13722 leveldb.cpp:273] Iterated through 0 keys in the db in 767ns
I1002 11:00:01.551029 13722 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
I1002 11:00:01.553994 13743 log.cpp:238] Attempting to join replica to ZooKeeper group
I1002 11:00:01.556193 13740 recover.cpp:449] Starting replica recovery
I1002 11:00:01.561755 13722 main.cpp:465] Starting Mesos master
I1002 11:00:01.563489 13740 recover.cpp:475] Replica is in EMPTY status
I1002 11:00:01.568989 13722 master.cpp:378] Master 20151002-110001-2874854303-5050-13722 (159.203.90.171) started on 159.203.90.171:5050
I1002 11:00:01.569059 13722 master.cpp:380] Flags at startup: --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="false" --authenticate_slaves="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname="159.203.90.171" --initialize_driver_logging="true" --ip="159.203.90.171" --log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" --port="5050" --quiet="false" --quorum="2" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" --registry_strict="false" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/var/lib/mesos" --zk="zk://159.203.90.171:2181,104.131.35.19:2181,104.131.117.124:2181/mesos" --zk_session_timeout="10secs"
I1002 11:00:01.569535 13722 master.cpp:427] Master allowing unauthenticated frameworks to register
I1002 11:00:01.569581 13722 master.cpp:432] Master allowing unauthenticated slaves to register
I1002 11:00:01.569608 13722 master.cpp:469] Using default 'crammd5' authenticator
W1002 11:00:01.569718 13722 authenticator.cpp:505] No credentials provided, authentication requests will be refused.
I1002 11:00:01.570199 13722 authenticator.cpp:512] Initializing server SASL
I1002 11:00:01.582969 13722 master.cpp:1464] Successfully attached file '/var/log/mesos/mesos-master.INFO'
I1002 11:00:01.584786 13743 contender.cpp:149] Joining the ZK group
I1002 11:00:11.573873 13747 recover.cpp:111] Unable to finish the recover protocol in 10secs, retrying
中-初级-3
Log file created at: 2015/10/02 11:00:12
Running on machine: mesos-primary-3
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I1002 11:00:12.609675 17105 logging.cpp:172] INFO level logging started!
I1002 11:00:12.610414 17105 main.cpp:229] Build: 2015-09-25 19:13:24 by root
I1002 11:00:12.610452 17105 main.cpp:231] Version: 0.24.1
I1002 11:00:12.610468 17105 main.cpp:234] Git tag: 0.24.1
I1002 11:00:12.610483 17105 main.cpp:238] Git SHA: 44873806c2bb55da37e9adbece938274d8cd7c48
I1002 11:00:12.610576 17105 main.cpp:252] Using 'HierarchicalDRF' allocator
I1002 11:00:12.618232 17105 leveldb.cpp:176] Opened db in 7.382537ms
I1002 11:00:12.619810 17105 leveldb.cpp:183] Compacted db in 1.512691ms
I1002 11:00:12.619876 17105 leveldb.cpp:198] Created db iterator in 27030ns
I1002 11:00:12.619910 17105 leveldb.cpp:204] Seeked to beginning of db in 1254ns
I1002 11:00:12.619925 17105 leveldb.cpp:273] Iterated through 0 keys in the db in 339ns
I1002 11:00:12.620028 17105 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
I1002 11:00:12.620930 17125 log.cpp:238] Attempting to join replica to ZooKeeper group
I1002 11:00:12.621615 17128 recover.cpp:449] Starting replica recovery
I1002 11:00:12.626735 17105 main.cpp:465] Starting Mesos master
I1002 11:00:12.627024 17128 recover.cpp:475] Replica is in EMPTY status
I1002 11:00:12.633635 17123 master.cpp:378] Master 20151002-110012-321094504-5050-17105 (104.131.35.19) started on 104.131.35.19:5050
I1002 11:00:12.633828 17123 master.cpp:380] Flags at startup: --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="false" --authenticate_slaves="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname="104.131.35.19" --initialize_driver_logging="true" --ip="104.131.35.19" --log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" --port="5050" --quiet="false" --quorum="2" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" --registry_strict="false" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/var/lib/mesos" --zk="zk://159.203.90.171:2181,104.131.35.19:2181,104.131.117.124:2181/mesos" --zk_session_timeout="10secs"
I1002 11:00:12.635736 17123 master.cpp:427] Master allowing unauthenticated frameworks to register
I1002 11:00:12.635771 17123 master.cpp:432] Master allowing unauthenticated slaves to register
I1002 11:00:12.635802 17123 master.cpp:469] Using default 'crammd5' authenticator
W1002 11:00:12.635835 17123 authenticator.cpp:505] No credentials provided, authentication requests will be refused.
I1002 11:00:12.636078 17123 authenticator.cpp:512] Initializing server SASL
I1002 11:00:12.643378 17125 contender.cpp:149] Joining the ZK group
I1002 11:00:12.643826 17123 master.cpp:1464] Successfully attached file '/var/log/mesos/mesos-master.INFO'
I1002 11:00:22.633390 17130 recover.cpp:111] Unable to finish the recover protocol in 10secs, retrying
我按照数字海洋指南中给出的指导方针来设置机器。
跑步
MASTER=$(mesos-resolve `cat /etc/mesos/zk`) mesos-execute --master=$MASTER --name="cluster-test" --command="sleep 5”
产量:
2015-10-02 12:30:26,137:14558(0x7f8dbb743700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2015-10-02 12:30:26,141:14558(0x7f8dbb743700):ZOO_INFO@log_env@716: Client environment:host.name=mesos-primary-1
2015-10-02 12:30:26,141:14558(0x7f8dbb743700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
2015-10-02 12:30:26,141:14558(0x7f8dbb743700):ZOO_INFO@log_env@724: Client environment:os.arch=3.13.0-57-generic
2015-10-02 12:30:26,141:14558(0x7f8dbb743700):ZOO_INFO@log_env@725: Client environment:os.version=#95-Ubuntu SMP Fri Jun 19 09:28:15 UTC 2015
2015-10-02 12:30:26,141:14558(0x7f8dbb743700):ZOO_INFO@log_env@733: Client environment:user.name=root
2015-10-02 12:30:26,141:14558(0x7f8dbb743700):ZOO_INFO@log_env@741: Client environment:user.home=/root
2015-10-02 12:30:26,141:14558(0x7f8dbb743700):ZOO_INFO@log_env@753: Client environment:user.dir=/root
2015-10-02 12:30:26,142:14558(0x7f8dbb743700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=159.203.90.171:2181,104.131.35.19:2181,104.131.117.124:2181 sessionTimeout=10000 watcher=0x7f8dc3625610 sessionId=0 sessionPasswd=<null> context=0x7f8da8003960 flags=0
2015-10-02 12:30:26,142:14558(0x7f8db6eff700):ZOO_INFO@check_events@1703: initiated connection to server [104.131.35.19:2181]
2015-10-02 12:30:26,144:14558(0x7f8db6eff700):ZOO_ERROR@handle_socket_error_msg@1721: Socket [104.131.35.19:2181] zk retcode=-4, errno=112(Host is down): failed while receiving a server response
2015-10-02 12:30:26,144:14558(0x7f8db6eff700):ZOO_INFO@check_events@1703: initiated connection to server [104.131.117.124:2181]
2015-10-02 12:30:26,144:14558(0x7f8db6eff700):ZOO_ERROR@handle_socket_error_msg@1721: Socket [104.131.117.124:2181] zk retcode=-4, errno=112(Host is down): failed while receiving a server response
2015-10-02 12:30:26,145:14558(0x7f8db6eff700):ZOO_INFO@check_events@1703: initiated connection to server [159.203.90.171:2181]
2015-10-02 12:30:26,147:14558(0x7f8db6eff700):ZOO_ERROR@handle_socket_error_msg@1721: Socket [159.203.90.171:2181] zk retcode=-4, errno=112(Host is down): failed while receiving a server response
2015-10-02 12:30:29,484:14558(0x7f8db6eff700):ZOO_INFO@check_events@1703: initiated connection to server [104.131.35.19:2181]
2015-10-02 12:30:29,485:14558(0x7f8db6eff700):ZOO_ERROR@handle_socket_error_msg@1721: Socket [104.131.35.19:2181] zk retcode=-4, errno=112(Host is down): failed while receiving a server response
2015-10-02 12:30:29,485:14558(0x7f8db6eff700):ZOO_INFO@check_events@1703: initiated connection to server [104.131.117.124:2181]
2015-10-02 12:30:29,486:14558(0x7f8db6eff700):ZOO_ERROR@handle_socket_error_msg@1721: Socket [104.131.117.124:2181] zk retcode=-4, errno=112(Host is down): failed while receiving a server response
2015-10-02 12:30:29,487:14558(0x7f8db6eff700):ZOO_INFO@check_events@1703: initiated connection to server [159.203.90.171:2181]
2015-10-02 12:30:29,488:14558(0x7f8db6eff700):ZOO_ERROR@handle_socket_error_msg@1721: Socket [159.203.90.171:2181] zk retcode=-4, errno=112(Host is down): failed while receiving a server response
Failed to detect master from 'zk://159.203.90.171:2181,104.131.35.19:2181,104.131.117.124:2181/mesos' within 5secs
root@mesos-primary-1:~# mesos-execute --master=$MASTER --name="cluster-test" --command="sleep 5"`
有人有什么想法吗?
1条答案
按热度按时间xfb7svmp1#
在我看来,要么你的机器无法互相访问,要么你的一些或所有机器的端口被阻塞在正确的端口上。确保:
答。2181(zookeeper)、2888和3888(分别为从属连接和主选择)以及5050(mesos)/8080(如果您使用的是marathon)上的端口被解锁,用于桌面/笔记本电脑的ui。奴隶只需要2888我相信可以从主人那里得到。
b。您可以先ping一台机器上的所有其他主机,即使用主机1和ping主机2和3。
c。在担心从机之前,先尝试正确调试组成集群的主机。
您在这里似乎有一组很好的配置和正确的仲裁设置,一旦确定计算机可以相互连接,就可以调查其他潜在问题。告诉我们进展如何!