marathon无法更新容器的资源

7rtdyuoh  于 2021-06-26  发布在  Mesos
关注(0)|答案(0)|浏览(275)

我想在马拉松赛上跑码头集装箱(mesos和marathon都在docker上运行。)
当我直接用dockerrun命令运行image时,一切都很好。
但当通过marathon运行图像时,它会在一分钟(60秒)后被杀死,然后marathon重建容器并在一分钟后仍然被杀死等等。
最奇怪的是我在mesos slave中发现了以下日志
在终端任务的状态更新时,未能更新执行器“jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86”的容器f891ffca-39c3-4b70-adae-2520864c42b2的资源,该执行器正在运行任务jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86,销毁容器:无法确定“cpu”子系统的cgroup:无法读取/proc/27321/cgroup:没有此类文件或目录
我在网上研究过这个问题,但大多数都是通过增加记忆来解决的,这种方法对我不起作用,甚至我把所有的记忆都给了它。
登录mesos slave

I0703 06:05:25.992172    18 slave.cpp:5283] Handling status update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 from executor(1)@mesos1:35786
E0703 06:05:26.070716    14 slave.cpp:5614] Failed to update resources for container f891ffca-39c3-4b70-adae-2520864c42b2 of executor 'jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86' running task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 on status update for terminal task, destroying container: Failed to determine cgroup for the 'cpu' subsystem: Failed to read /proc/27321/cgroup: No such file or directory
I0703 06:05:26.070905    19 docker.cpp:2331] Destroying container f891ffca-39c3-4b70-adae-2520864c42b2 in RUNNING state
I0703 06:05:26.070940    19 docker.cpp:2336] Sending SIGTERM to executor with pid: 792
I0703 06:05:26.070894    12 task_status_update_manager.cpp:328] Received task status update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003
I0703 06:05:26.070981    12 task_status_update_manager.cpp:842] Checkpointing UPDATE for task status update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003
I0703 06:05:26.071130    12 slave.cpp:5775] Forwarding the update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 to master@mesos1:5060
I0703 06:05:26.071277    12 slave.cpp:5684] Sending acknowledgement for status update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 to executor(1)@mesos1:35786
I0703 06:05:26.073609    19 docker.cpp:2381] Running docker stop on container f891ffca-39c3-4b70-adae-2520864c42b2
I0703 06:05:26.076584    17 slave.cpp:5907] Got exited event for executor(1)@mesos1:35786
I0703 06:05:26.082994    12 task_status_update_manager.cpp:401] Received task status update acknowledgement (UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003
I0703 06:05:26.083410    12 task_status_update_manager.cpp:842] Checkpointing ACK for task status update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003
I0703 06:05:26.171222    14 docker.cpp:2560] Executor for container f891ffca-39c3-4b70-adae-2520864c42b2 has exited
I0703 06:05:26.172829    12 slave.cpp:6305] Executor 'jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86' of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 terminated with signal Terminated
I0703 06:05:26.172868    12 slave.cpp:6403] Cleaning up executor 'jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86' of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 at executor(1)@mesos1:35786
I0703 06:05:26.173218    18 gc.cpp:90] Scheduling '/var/tmp/mesos/slaves/82f5f7aa-772c-48b8-b9e9-5675fe0b7fa9-S0/frameworks/6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003/executors/jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86/runs/f891ffca-39c3-4b70-adae-2520864c42b2' for gc 6.99999799573037days in the future

登录mesos master

I0703 06:05:26.071590    15 master.cpp:7962] Status update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 from agent 82f5f7aa-772c-48b8-b9e9-5675fe0b7fa9-S0 at slave(1)@mesos1:5051 (mesos1)
I0703 06:05:26.071923    15 master.cpp:8018] Forwarding status update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003
I0703 06:05:26.072099    15 master.cpp:10278] Updating the state of task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 (latest state: TASK_FAILED, status update state: TASK_FAILED)
I0703 06:05:26.080749    15 master.cpp:5623] Processing REVIVE call for framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 (marathon) at scheduler-13cb0ef0-e5fb-40ac-aa5e-d8d7284e409b@mesos1:44408
I0703 06:05:26.080828    15 hierarchical.cpp:1339] Revived offers for roles { * } of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003
I0703 06:05:26.081554    13 master.cpp:8870] Sending 1 offers to framework 2b59b774-1033-4f63-b403-fce174f8155b-0004 (Spark Cluster) at scheduler-48bf92fe-e69d-4af8-8d7e-e22cc5177d02@mesos3:46594
I0703 06:05:26.081902    13 master.cpp:8870] Sending 2 offers to framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 (marathon) at scheduler-13cb0ef0-e5fb-40ac-aa5e-d8d7284e409b@mesos1:44408
I0703 06:05:26.082159    12 http.cpp:1185] HTTP GET for /master/state?jsonp=angular.callbacks._3w from 10.1.21.12:65271 with User-Agent='Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
I0703 06:05:26.082294    12 master.cpp:5877] Processing ACKNOWLEDGE call 4f55a18e-37ea-48fc-8f2d-2228f95a7097 for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 (marathon) at scheduler-13cb0ef0-e5fb-40ac-aa5e-d8d7284e409b@mesos1:44408 on agent 82f5f7aa-772c-48b8-b9e9-5675fe0b7fa9-S0
I0703 06:05:26.082341    12 master.cpp:10382] Removing task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 with resources cpus(allocated: *):0.5; mem(allocated: *):128; disk(allocated: *):128; ports(allocated: *):[50690-50690] of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 on agent 82f5f7aa-772c-48b8-b9e9-5675fe0b7fa9-S0 at slave(1)@mesos1:5051 (mesos1)

登录马拉松

[2018-07-03 06:05:26,073] INFO  Received status update for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86: TASK_FAILED (Failed to get exit status of container) (mesosphere.marathon.MarathonScheduler:Thread-97)
[2018-07-03 06:05:26,075] INFO  all tasks of instance [jping.marathon-eff7501c-7e86-11e8-aea3-22ffdeeedb86] are terminal, requesting to expunge (mesosphere.marathon.core.instance.update.InstanceUpdater$:marathon-akka.actor.default-dispatcher-16)
[2018-07-03 06:05:26,079] INFO  Removed app [/jping] from tracker (mesosphere.marathon.core.task.tracker.InstanceTracker$InstancesBySpec:marathon-akka.actor.default-dispatcher-16)
[2018-07-03 06:05:26,080] INFO  Increasing delay. Task launch delay for [/jping - 2018-07-03T03:49:41.174Z] is set to 2 seconds 313 milliseconds (mesosphere.marathon.core.launchqueue.impl.RateLimiter$:marathon-akka.actor.default-dispatcher-16)
[2018-07-03 06:05:26,080] INFO  receiveInstanceUpdate: instance [jping.marathon-eff7501c-7e86-11e8-aea3-22ffdeeedb86] was deleted (Failed) (mesosphere.marathon.core.launchqueue.impl.TaskLauncherActor:marathon-akka.actor.default-dispatcher-19)
[2018-07-03 06:05:26,080] INFO  Received reviveOffers notification: ReviveOffers$ (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-15)
[2018-07-03 06:05:26,080] INFO  => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-15)
[2018-07-03 06:05:26,080] INFO  initiating a scale check for runSpec [/jping] due to [instance [jping.marathon-eff7501c-7e86-11e8-aea3-22ffdeeedb86]] Failed (mesosphere.marathon.core.task.update.impl.steps.ScaleAppUpdateStepImpl:marathon-akka.actor.default-dispatcher-16)
[2018-07-03 06:05:26,080] INFO  2 further revives still needed. Repeating reviveOffers according to --revive_offers_repetitions 3 (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-15)
[2018-07-03 06:05:26,080] INFO  => Schedule next revive at 2018-07-03T06:05:31.080Z in 5000 milliseconds, adhering to --min_revive_offers_interval 5000 (ms) (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-15)
[2018-07-03 06:05:26,080] INFO  Acknowledge status update for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86: TASK_FAILED (Failed to get exit status of container) (mesosphere.marathon.core.task.update.impl.TaskStatusUpdateProcessorImpl:scala-execution-context-global-150)
[2018-07-03 06:05:26,081] INFO  Need to scale /jping from 0 up to 1 instances (mesosphere.marathon.SchedulerActions:scheduler-actions-thread-0)
[2018-07-03 06:05:26,081] INFO  Queueing 1 new instances for /jping to the already 0 queued ones (mesosphere.marathon.SchedulerActions:scheduler-actions-thread-0)
[2018-07-03 06:05:26,081] INFO  add 1 instances to 0 instances to launch

马拉松任务失败消息(此图片显示另一个任务id,但消息和问题相同)
有没有类似的问题我可以参考?
马拉松版本:1.6.352
mesos版本:1.5.1

已解决

我终于找到问题了。
marathon容器任务被终止的原因是docker不知道marathon创建的pid。所以docker杀死了马拉松创建的pid。这导致了我的问题。
自从我在mesos slave上添加了带有--pid=host的docker run命令之后,这个问题就解决了。
我在这里找到了解决办法

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题