我根据这个指南建立了一个小的测试中间层星团https://dcos.io/docs/1.8/administration/installing/custom/cli/ 一切都很顺利。群集中只有3个节点,一个用于引导,一个主节点(10.7.1.12)和一个代理节点(10.7.1.13)。
但是在用代理节点重新启动机器之后,主节点就看不到它了
.
在 /var/log/mesos/mesos-agent.log
上次输入在重新启动前有时间戳。我试着从https://dcos.io/docs/1.8/administration/installing/custom/troubleshooting/ 但一切都没变。
以下是代理断开连接后主服务器的日志( sudo journalctl -u dcos-mesos-master
)
lut 06 15:48:14 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:48:14.556001 2671 master.cpp:1245] Agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13) disconnected
lut 06 15:48:14 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:48:14.556089 2671 master.cpp:2784] Disconnecting agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13)
lut 06 15:48:14 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:48:14.556170 2671 master.cpp:2803] Deactivating agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13)
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.926198 2670 master.cpp:5334] Shutting down agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13) with message 'health check timed out'
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.926230 2670 master.cpp:6617] Removing agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13): health check timed out
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.926507 2670 master.cpp:6910] Removing task 93f4b075-1338-4a84-afd6-6932cfe44c30 with resources mem(arangodb31, arangodb3):2048; cpus(arangodb31, arangodb3):0.25; disk(arangodb31, arangodb3)[AGENCY_991972e5-2d83-4710-ba3c-de8cf02303ab:myPersistentVolume]:2048; ports(arangodb31, arangodb3):[1026-1026] of framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0004 on agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13)
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.926695 2670 master.cpp:6910] Removing task 644b59eb-fb20-43fd-a7c1-b1d9406cbfcb with resources mem(arangodb3, arangodb3):2048; cpus(arangodb3, arangodb3):0.25; disk(arangodb3, arangodb3)[AGENCY_0c76702f-ae8b-423c-83a8-1b6e2af8b723:myPersistentVolume]:2048; ports(arangodb3, arangodb3):[1025-1025] of framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0002 on agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13)
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928460 2670 master.cpp:6736] Removed agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13): health check timed out
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928472 2670 master.cpp:5197] Sending status update TASK_LOST for task 93f4b075-1338-4a84-afd6-6932cfe44c30 of framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0004 'Slave 10.7.1.13 removed: health check timed out'
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.928486 2670 master.hpp:2113] Master attempted to send message to disconnected framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0004 (arangodb3-1) at scheduler-f4f3a3f0-2261-4aaf-9390-81f4b1cc6d20@10.7.1.13:25366
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928611 2670 master.cpp:5197] Sending status update TASK_LOST for task 644b59eb-fb20-43fd-a7c1-b1d9406cbfcb of framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0002 'Slave 10.7.1.13 removed: health check timed out'
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.928638 2670 master.hpp:2113] Master attempted to send message to disconnected framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0002 (arangodb3) at scheduler-7870582e-becd-4747-aeba-0217e91d537e@10.7.1.13:19866
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928747 2670 master.cpp:6759] Notifying framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0003 (arangodb3-standalone) at scheduler-180f6695-f3c9-4da6-80e8-d1dc633ec737@10.7.1.13:3583 of lost agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13) after recovering
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.928761 2670 master.hpp:2113] Master attempted to send message to disconnected framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0003 (arangodb3-standalone) at scheduler-180f6695-f3c9-4da6-80e8-d1dc633ec737@10.7.1.13:3583
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928894 2670 master.cpp:6759] Notifying framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0002 (arangodb3) at scheduler-7870582e-becd-4747-aeba-0217e91d537e@10.7.1.13:19866 of lost agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13) after recovering
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.928905 2670 master.hpp:2113] Master attempted to send message to disconnected framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0002 (arangodb3) at scheduler-7870582e-becd-4747-aeba-0217e91d537e@10.7.1.13:19866
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928921 2670 master.cpp:6759] Notifying framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0004 (arangodb3-1) at scheduler-f4f3a3f0-2261-4aaf-9390-81f4b1cc6d20@10.7.1.13:25366 of lost agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13) after recovering
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.928928 2670 master.hpp:2113] Master attempted to send message to disconnected framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0004 (arangodb3-1) at scheduler-f4f3a3f0-2261-4aaf-9390-81f4b1cc6d20@10.7.1.13:25366
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928941 2670 master.cpp:6759] Notifying framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0001 (metronome) at scheduler-4eb937a4-9a64-4a47-9245-3858defe691a@10.7.1.12:41077 of lost agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13) after recovering
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928963 2670 master.cpp:6759] Notifying framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0000 (marathon) at scheduler-02bf4e29-4dd7-4cf8-b14b-4a064b4d082c@10.7.1.12:43643 of lost agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13) after recovering
其余日记帐(journalctl…)为空。 zookeeper日志中也存在此错误 ![](https://i.stack.imgur.com/TmHnu.png) 如能就如何进一步调查提出任何建议,我将不胜感激。 编辑: 我通过启动
dcos-mesos-slave service(在那之前我必须开始
dcos-spartan以及
dcos-gen-resolvconf` 服务)。你知道为什么它不会自动启动吗?
1条答案
按热度按时间jogvjijk1#
你知道为什么它不会自动启动吗?
根据可靠地使用systemd的规则,systemd单元彼此不依赖,所以您需要手动启动一切。
Requires=
,Wants=
不允许。如果依赖的东西失败了,依赖它的东西就再也不会尝试重新开始。Before=
,After=
你很气馁。它们并不是强有力的保证,软件需要检查先决条件是否正常工作