mesos:无法获取/更新executor的资源统计信息

guz6ccqo 于 2021-06-26 发布在 Mesos

关注(0)|答案(1)|浏览(298)

mesos代理的完整日志出现问题，消息如下：

2018-06-19T07:31:05.247394+00:00 mesos-slave16 mesos-slave[10243]: W0619 07:31:05.244067 10249 slave.cpp:6750] Failed to get resource statistics for executor 'research_new-benchmarks_production_testbox-58-1529393461975-1-mesos_slave16' of framework Singularity-PROD: Failed to run 'docker -H unix:///var/run/docker.sock inspect mesos-7560fb72-28d3-4cce-8cb0-de889248cf93': exited with status 1; stderr='Error: No such object: mesos-7560fb72-28d3-4cce-8cb0-de889248cf93

或

2018-06-19T07:31:09.904414+00:00 mesos-slave16 mesos-slave[10243]: E0619 07:31:09.903687 10251 slave.cpp:4721] Failed to update resources for container b9a9f7f9-938b-4ec4-a245-331122471769 of executor 'hera_listening-api_production_checkAlert-93-1529393402085-1-mesos_slave16-us_west_2a' running task hera_listening-api_production_checkAlert-93-1529393402085-1-mesos_slave16 on status update for terminal task, destroying container: Failed to determine cgroup for the 'cpu' subsystem: Failed to read /proc/14447/cgroup: Failed to open file: No such file or directory

我们正在运行3xha-mesos-master，marathon框架，singularity框架-这两个框架的任务都在运行。任务正在运行，cron（来自singularity）也在运行，但我对千条消息感到困惑。我们有600多个长跑马拉松任务，每几分钟就有30多个cron开始。
docker版本：18.03.0-ce mesos版本：1.4.0-2.0.1 marathon版本：1.4.2-1.0.647.ubuntu1604奇点版本：0.15.1
运行在带有aws内核的ubuntu16.04上的主设备和从设备-4.4.0-1060-aws
我认为mesos在slave上的executor在任务完成后被删除，但是mesos仍然试图从docker获取信息，在docker上任务是不可见的。
有什么想法吗？谢谢

docker mesos mesosphere marathon

来源：https://stackoverflow.com/questions/50923797/mesos-failed-to-get-update-resource-statistics-for-executor