我想在马拉松赛上跑码头集装箱(mesos和marathon都在docker上运行。)
当我直接用dockerrun命令运行image时,一切都很好。
但当通过marathon运行图像时,它会在一分钟(60秒)后被杀死,然后marathon重建容器并在一分钟后仍然被杀死等等。
最奇怪的是我在mesos slave中发现了以下日志
在终端任务的状态更新时,未能更新执行器“jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86”的容器f891ffca-39c3-4b70-adae-2520864c42b2的资源,该执行器正在运行任务jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86,销毁容器:无法确定“cpu”子系统的cgroup:无法读取/proc/27321/cgroup:没有此类文件或目录
我在网上研究过这个问题,但大多数都是通过增加记忆来解决的,这种方法对我不起作用,甚至我把所有的记忆都给了它。
登录mesos slave
I0703 06:05:25.992172 18 slave.cpp:5283] Handling status update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 from executor(1)@mesos1:35786
E0703 06:05:26.070716 14 slave.cpp:5614] Failed to update resources for container f891ffca-39c3-4b70-adae-2520864c42b2 of executor 'jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86' running task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 on status update for terminal task, destroying container: Failed to determine cgroup for the 'cpu' subsystem: Failed to read /proc/27321/cgroup: No such file or directory
I0703 06:05:26.070905 19 docker.cpp:2331] Destroying container f891ffca-39c3-4b70-adae-2520864c42b2 in RUNNING state
I0703 06:05:26.070940 19 docker.cpp:2336] Sending SIGTERM to executor with pid: 792
I0703 06:05:26.070894 12 task_status_update_manager.cpp:328] Received task status update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003
I0703 06:05:26.070981 12 task_status_update_manager.cpp:842] Checkpointing UPDATE for task status update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003
I0703 06:05:26.071130 12 slave.cpp:5775] Forwarding the update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 to master@mesos1:5060
I0703 06:05:26.071277 12 slave.cpp:5684] Sending acknowledgement for status update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 to executor(1)@mesos1:35786
I0703 06:05:26.073609 19 docker.cpp:2381] Running docker stop on container f891ffca-39c3-4b70-adae-2520864c42b2
I0703 06:05:26.076584 17 slave.cpp:5907] Got exited event for executor(1)@mesos1:35786
I0703 06:05:26.082994 12 task_status_update_manager.cpp:401] Received task status update acknowledgement (UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003
I0703 06:05:26.083410 12 task_status_update_manager.cpp:842] Checkpointing ACK for task status update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003
I0703 06:05:26.171222 14 docker.cpp:2560] Executor for container f891ffca-39c3-4b70-adae-2520864c42b2 has exited
I0703 06:05:26.172829 12 slave.cpp:6305] Executor 'jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86' of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 terminated with signal Terminated
I0703 06:05:26.172868 12 slave.cpp:6403] Cleaning up executor 'jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86' of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 at executor(1)@mesos1:35786
I0703 06:05:26.173218 18 gc.cpp:90] Scheduling '/var/tmp/mesos/slaves/82f5f7aa-772c-48b8-b9e9-5675fe0b7fa9-S0/frameworks/6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003/executors/jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86/runs/f891ffca-39c3-4b70-adae-2520864c42b2' for gc 6.99999799573037days in the future
登录mesos master
I0703 06:05:26.071590 15 master.cpp:7962] Status update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 from agent 82f5f7aa-772c-48b8-b9e9-5675fe0b7fa9-S0 at slave(1)@mesos1:5051 (mesos1)
I0703 06:05:26.071923 15 master.cpp:8018] Forwarding status update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003
I0703 06:05:26.072099 15 master.cpp:10278] Updating the state of task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 (latest state: TASK_FAILED, status update state: TASK_FAILED)
I0703 06:05:26.080749 15 master.cpp:5623] Processing REVIVE call for framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 (marathon) at scheduler-13cb0ef0-e5fb-40ac-aa5e-d8d7284e409b@mesos1:44408
I0703 06:05:26.080828 15 hierarchical.cpp:1339] Revived offers for roles { * } of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003
I0703 06:05:26.081554 13 master.cpp:8870] Sending 1 offers to framework 2b59b774-1033-4f63-b403-fce174f8155b-0004 (Spark Cluster) at scheduler-48bf92fe-e69d-4af8-8d7e-e22cc5177d02@mesos3:46594
I0703 06:05:26.081902 13 master.cpp:8870] Sending 2 offers to framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 (marathon) at scheduler-13cb0ef0-e5fb-40ac-aa5e-d8d7284e409b@mesos1:44408
I0703 06:05:26.082159 12 http.cpp:1185] HTTP GET for /master/state?jsonp=angular.callbacks._3w from 10.1.21.12:65271 with User-Agent='Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
I0703 06:05:26.082294 12 master.cpp:5877] Processing ACKNOWLEDGE call 4f55a18e-37ea-48fc-8f2d-2228f95a7097 for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 (marathon) at scheduler-13cb0ef0-e5fb-40ac-aa5e-d8d7284e409b@mesos1:44408 on agent 82f5f7aa-772c-48b8-b9e9-5675fe0b7fa9-S0
I0703 06:05:26.082341 12 master.cpp:10382] Removing task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 with resources cpus(allocated: *):0.5; mem(allocated: *):128; disk(allocated: *):128; ports(allocated: *):[50690-50690] of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 on agent 82f5f7aa-772c-48b8-b9e9-5675fe0b7fa9-S0 at slave(1)@mesos1:5051 (mesos1)
登录马拉松
[2018-07-03 06:05:26,073] INFO Received status update for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86: TASK_FAILED (Failed to get exit status of container) (mesosphere.marathon.MarathonScheduler:Thread-97)
[2018-07-03 06:05:26,075] INFO all tasks of instance [jping.marathon-eff7501c-7e86-11e8-aea3-22ffdeeedb86] are terminal, requesting to expunge (mesosphere.marathon.core.instance.update.InstanceUpdater$:marathon-akka.actor.default-dispatcher-16)
[2018-07-03 06:05:26,079] INFO Removed app [/jping] from tracker (mesosphere.marathon.core.task.tracker.InstanceTracker$InstancesBySpec:marathon-akka.actor.default-dispatcher-16)
[2018-07-03 06:05:26,080] INFO Increasing delay. Task launch delay for [/jping - 2018-07-03T03:49:41.174Z] is set to 2 seconds 313 milliseconds (mesosphere.marathon.core.launchqueue.impl.RateLimiter$:marathon-akka.actor.default-dispatcher-16)
[2018-07-03 06:05:26,080] INFO receiveInstanceUpdate: instance [jping.marathon-eff7501c-7e86-11e8-aea3-22ffdeeedb86] was deleted (Failed) (mesosphere.marathon.core.launchqueue.impl.TaskLauncherActor:marathon-akka.actor.default-dispatcher-19)
[2018-07-03 06:05:26,080] INFO Received reviveOffers notification: ReviveOffers$ (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-15)
[2018-07-03 06:05:26,080] INFO => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-15)
[2018-07-03 06:05:26,080] INFO initiating a scale check for runSpec [/jping] due to [instance [jping.marathon-eff7501c-7e86-11e8-aea3-22ffdeeedb86]] Failed (mesosphere.marathon.core.task.update.impl.steps.ScaleAppUpdateStepImpl:marathon-akka.actor.default-dispatcher-16)
[2018-07-03 06:05:26,080] INFO 2 further revives still needed. Repeating reviveOffers according to --revive_offers_repetitions 3 (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-15)
[2018-07-03 06:05:26,080] INFO => Schedule next revive at 2018-07-03T06:05:31.080Z in 5000 milliseconds, adhering to --min_revive_offers_interval 5000 (ms) (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-15)
[2018-07-03 06:05:26,080] INFO Acknowledge status update for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86: TASK_FAILED (Failed to get exit status of container) (mesosphere.marathon.core.task.update.impl.TaskStatusUpdateProcessorImpl:scala-execution-context-global-150)
[2018-07-03 06:05:26,081] INFO Need to scale /jping from 0 up to 1 instances (mesosphere.marathon.SchedulerActions:scheduler-actions-thread-0)
[2018-07-03 06:05:26,081] INFO Queueing 1 new instances for /jping to the already 0 queued ones (mesosphere.marathon.SchedulerActions:scheduler-actions-thread-0)
[2018-07-03 06:05:26,081] INFO add 1 instances to 0 instances to launch
马拉松任务失败消息(此图片显示另一个任务id,但消息和问题相同)
有没有类似的问题我可以参考?
马拉松版本:1.6.352
mesos版本:1.5.1
已解决
我终于找到问题了。
marathon容器任务被终止的原因是docker不知道marathon创建的pid。所以docker杀死了马拉松创建的pid。这导致了我的问题。
自从我在mesos slave上添加了带有--pid=host的docker run命令之后,这个问题就解决了。
我在这里找到了解决办法
暂无答案!
目前还没有任何答案,快来回答吧!