我想从一个游牧者的工作中运行一个cassandra容器。它似乎开始了,但几秒钟后就死了(它似乎是被游牧民族自己杀死的)。
如果我从命令行运行容器,使用:
docker run --name some-cassandra -p 9042:9042 -d cassandra:3.0
容器启动完美无瑕。但如果我创造一个这样的游牧工作:
job "cassandra" {
datacenters = ["dc1"]
type = "service"
update {
max_parallel = 1
min_healthy_time = "10s"
healthy_deadline = "5m"
progress_deadline = "10m"
auto_revert = false
canary = 0
}
migrate {
max_parallel = 1
health_check = "checks"
min_healthy_time = "10s"
healthy_deadline = "5m"
}
group "cassandra" {
restart {
attempts = 2
interval = "240s"
delay = "120s"
mode = "delay"
}
task "cassandra" {
driver = "docker"
config {
image = "cassandra:3.0"
network_mode = "bridge"
port_map {
cql = 9042
}
}
resources {
memory = 2048
cpu = 800
network {
port "cql" {}
}
}
env {
CASSANDRA_LISTEN_ADDRESS = "${NOMAD_IP_cql}"
}
service {
name = "cassandra"
tags = ["global", "cassandra"]
port = "cql"
}
}
}
}
那它就永远不会开始了。nomad的web界面在创建的分配的stdout日志中没有显示任何内容,stdin流只显示killed。
我知道当这发生时,docker容器被创建,并在几秒钟后被移除。我看不懂这些容器的日志,因为当我尝试(用 docker logs <container_id>
),我得到的是:
Error response from daemon: configured logging driver does not support reading
分配概览显示以下消息:
12/06/18 14:16:04 Terminated Exit Code: 137, Exit Message: "Docker container exited with non-zero exit code: 137"
据docker说:
如果容器启动时没有初始化数据库,则将创建默认数据库。虽然这是预期的行为,但这意味着在初始化完成之前,它不会接受传入连接。在使用自动化工具(如docker compose)同时启动多个容器时,这可能会导致问题。
但我怀疑这是我问题的根源,因为我增加了 restart
节值无效,因为任务在几秒钟后就失败了。
不久前,我遇到了一个类似的问题 kafka
事实证明,容器并不快乐,因为它需要更多的记忆。但在本例中,我为内存和cpu提供了大量的值 resources
这似乎没有什么区别。
我的主机操作系统是ArchLinux,带有内核 4.19.4-arch1-1-ARCH
. 我将consul作为systemd服务运行,nomad代理使用以下命令行:
sudo nomad agent -dev
我可能会错过什么?任何帮助和/或指点都将不胜感激。
更新(2018-12-06 16:26 gmt):通过详细阅读nomad代理的输出,我得到一些有价值的信息可以在主机上读取 /tmp
目录。输出的一个片段:
2018/12/06 16:03:03 [DEBUG] memberlist: TCP connection from=127.0.0.1:45792
2018/12/06 16:03:03.180586 [DEBUG] driver.docker: docker pull cassandra:latest succeeded
2018-12-06T16:03:03.184Z [DEBUG] plugin: starting plugin: path=/usr/bin/nomad args="[/usr/bin/nomad executor {"LogFile":"/tmp/NomadClient073551030/1c315bf2-688c-2c7b-8d6f-f71fec1254f3/cassandra/executor.out","LogLevel":"DEBUG"}]"
2018-12-06T16:03:03.185Z [DEBUG] plugin: waiting for RPC address: path=/usr/bin/nomad
2018-12-06T16:03:03.235Z [DEBUG] plugin.nomad: plugin address: timestamp=2018-12-06T16:03:03.235Z address=/tmp/plugin681788273 network=unix
2018/12/06 16:03:03.253166 [DEBUG] driver.docker: Setting default logging options to syslog and unix:///tmp/plugin559865372
2018/12/06 16:03:03.253196 [DEBUG] driver.docker: Using config for logging: {Type:syslog ConfigRaw:[] Config:map[syslog-address:unix:///tmp/plugin559865372]}
2018/12/06 16:03:03.253206 [DEBUG] driver.docker: using 2147483648 bytes memory for cassandra
2018/12/06 16:03:03.253217 [DEBUG] driver.docker: using 800 cpu shares for cassandra
2018/12/06 16:03:03.253237 [DEBUG] driver.docker: binding directories []string{"/tmp/NomadClient073551030/1c315bf2-688c-2c7b-8d6f-f71fec1254f3/alloc:/alloc", "/tmp/NomadClient073551030/1c315bf2-688c-2c7b-8d6f-f71fec1254f3/cassandra/local:/local", "/tmp/NomadClient073551030/1c315bf2-688c-2c7b-8d6f-f71fec1254f3/cassandra/secrets:/secrets"} for cassandra
2018/12/06 16:03:03.253282 [DEBUG] driver.docker: allocated port 127.0.0.1:29073 -> 9042 (mapped)
2018/12/06 16:03:03.253296 [DEBUG] driver.docker: exposed port 9042
2018/12/06 16:03:03.253320 [DEBUG] driver.docker: setting container name to: cassandra-1c315bf2-688c-2c7b-8d6f-f71fec1254f3
2018/12/06 16:03:03.361162 [INFO] driver.docker: created container 29b0764bd2de69bda6450ebb1a55ffd2cbb4dc3002f961cb5db71b323d611199
2018/12/06 16:03:03.754476 [INFO] driver.docker: started container 29b0764bd2de69bda6450ebb1a55ffd2cbb4dc3002f961cb5db71b323d611199
2018/12/06 16:03:03.757642 [DEBUG] consul.sync: registered 1 services, 0 checks; deregistered 0 services, 0 checks
2018/12/06 16:03:03.765001 [DEBUG] client: error fetching stats of task cassandra: stats collection hasn't started yet
2018/12/06 16:03:03.894514 [DEBUG] client: updated allocations at index 371 (total 2) (pulled 0) (filtered 2)
2018/12/06 16:03:03.894584 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 2)
2018/12/06 16:03:05.190647 [DEBUG] driver.docker: error collecting stats from container 29b0764bd2de69bda6450ebb1a55ffd2cbb4dc3002f961cb5db71b323d611199: io: read/write on closed pipe
2018-12-06T16:03:09.191Z [DEBUG] plugin.nomad: 2018/12/06 16:03:09 [ERR] plugin: plugin server: accept unix /tmp/plugin681788273: use of closed network connection
2018-12-06T16:03:09.194Z [DEBUG] plugin: plugin process exited: path=/usr/bin/nomad
2018/12/06 16:03:09.223734 [INFO] client: task "cassandra" for alloc "1c315bf2-688c-2c7b-8d6f-f71fec1254f3" failed: Wait returned exit code 137, signal 0, and error Docker container exited with non-zero exit code: 137
2018/12/06 16:03:09.223802 [INFO] client: Restarting task "cassandra" for alloc "1c315bf2-688c-2c7b-8d6f-f71fec1254f3" in 2m7.683274502s
2018/12/06 16:03:09.230053 [DEBUG] consul.sync: registered 0 services, 0 checks; deregistered 1 services, 0 checks
2018/12/06 16:03:09.233507 [DEBUG] consul.sync: registered 0 services, 0 checks; deregistered 0 services, 0 checks
2018/12/06 16:03:09.296185 [DEBUG] client: updated allocations at index 372 (total 2) (pulled 0) (filtered 2)
2018/12/06 16:03:09.296313 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 2)
2018/12/06 16:03:11.541901 [DEBUG] http: Request GET /v1/agent/health?type=client (452.678µs)
但是 /tmp/NomadClient.../<alloc_id>/...
看似简单:
[root@singularity 1c315bf2-688c-2c7b-8d6f-f71fec1254f3]# ls -lR
.:
total 0
drwxrwxrwx 5 nobody nobody 100 Dec 6 15:52 alloc
drwxrwxrwx 5 nobody nobody 120 Dec 6 15:53 cassandra
./alloc:
total 0
drwxrwxrwx 2 nobody nobody 40 Dec 6 15:52 data
drwxrwxrwx 2 nobody nobody 80 Dec 6 15:53 logs
drwxrwxrwx 2 nobody nobody 40 Dec 6 15:52 tmp
./alloc/data:
total 0
./alloc/logs:
total 0
-rw-r--r-- 1 root root 0 Dec 6 15:53 cassandra.stderr.0
-rw-r--r-- 1 root root 0 Dec 6 15:53 cassandra.stdout.0
./alloc/tmp:
total 0
./cassandra:
total 4
-rw-r--r-- 1 root root 1248 Dec 6 16:19 executor.out
drwxrwxrwx 2 nobody nobody 40 Dec 6 15:52 local
drwxrwxrwx 2 nobody nobody 60 Dec 6 15:52 secrets
drwxrwxrwt 2 nobody nobody 40 Dec 6 15:52 tmp
./cassandra/local:
total 0
./cassandra/secrets:
total 0
./cassandra/tmp:
total 0
两者 cassandra.stdout.0
以及 cassandra.stderr.0
是空的,并且 executor.out
文件是:
2018/12/06 15:53:22.822072 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin278120866
2018/12/06 15:55:53.009611 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin242312234
2018/12/06 15:58:29.135309 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin226242288
2018/12/06 16:00:53.942271 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin373025133
2018/12/06 16:03:03.252389 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin559865372
2018/12/06 16:05:19.656317 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin090082811
2018/12/06 16:07:28.468809 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin383954837
2018/12/06 16:09:54.068604 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin412544225
2018/12/06 16:12:10.085157 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin279043152
2018/12/06 16:14:48.255653 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin209533710
2018/12/06 16:17:23.735550 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin168184243
2018/12/06 16:19:40.232181 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin839254781
2018/12/06 16:22:13.485457 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin406142133
2018/12/06 16:24:24.869274 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin964077792
更新(2018-12-06 16:40 gmt):很明显,代理需要登录syslog,因此我设置并启动了一个本地syslog服务器,但没有效果。系统日志服务器没有收到任何消息。
1条答案
按热度按时间5gfr0r5j1#
问题解决了。其性质有两方面:
nomad的docker驱动程序(非常有效)封装了容器的行为,使它们有时非常安静。
Cassandra对资源要求很高。比我原来想的要多。我确信4gbram足以让它舒适地运行,但事实证明它需要6gb(至少在我的环境中是这样)。
免责声明:我现在正在使用
bitnami/cassandra
而不是cassandra
,因为我相信他们的图像是非常高质量,安全和可配置的环境变量手段。这个发现是我用bitnami的图像做的,我还没有测试原始图像对拥有如此大的内存有何React。至于为什么直接从docker的cli运行容器时它不会失败,我认为这是因为以这种方式运行容器时没有限制的规范。docker只需要占用容器所需的内存,因此如果最终主机的内存不足以容纳所有容器,那么实现的时间会晚得多(可能会很痛苦)。因此,这种早期的失败应该是作为nomad的编排平台的一个值得欢迎的好处。如果我有什么抱怨的话,那就是发现问题花了这么长时间,因为集装箱的能见度太低!