我正在使用Amazon EKS运行Kubernetes 1.25集群。我使用Helm chart部署了Anchore应用程序。我修改了容器镜像,从我的AWS ECR仓库而不是Docker中提取。
查看其中一个pod的日志,我发现它正在尝试访问数据库服务,但无法解析。
(Background on this error at: https://sqlalche.me/e/14/e3q8)
[MainThread] 2023-04-30T00:06:41.155167 [anchore_enterprise_manager.util.db/connect_database()] [INFO] DB attempting to connect...
[MainThread] 2023-04-30T00:06:41.156165 [anchore_enterprise_manager.util.db/connect_database()] [WARN] DB connection failed, retrying - exception: test connection failed - exception: (psycopg2.OperationalError) could not translate host name "postgresql.anchore.svc.cluster.local:5432" to address: Name or service not known
这里是我的postgresql服务~ k获取服务postgres-postgresql名称类型CLUSTER-IP EXTERNAL-IP端口年龄postgres-postgresql ClusterIP172.20.191.835432/TCP 27 h
~ k get endpoints postgres-postgresql NAME ENDPOINTS AGE postgres-postgresql 10.1.0.74:5432 27h
postgres的pod日志里什么都没有。
我已经验证了AWS安全组是完全开放的,允许集群和节点之间的所有流量。已验证Core DNS是否正常工作。启动一个忙碌的盒子pod并解决了上述服务。
➜ anchore git:(main) ✗ k exec -it busybox-pod -- nslookup postgresql.anchore.svc.cluster.local
Server: 172.20.0.10
Address: 172.20.0.10:53
Name: postgresql.anchore.svc.cluster.local
Address: 172.20.191.83
下面是来自postgresql pod的日志
k logs postgres-postgresql-59468ff768-zhn6z
Defaulted container "postgresql" out of: postgresql, postgres-postgresql
PostgreSQL Database directory appears to contain a database; Skipping initialization
2023-04-30 14:52:22.289 UTC [1] LOG: starting PostgreSQL 14.6 (Debian 14.6-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-04-30 14:52:22.289 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2023-04-30 14:52:22.289 UTC [1] LOG: listening on IPv6 address "::", port 5432
2023-04-30 14:52:22.292 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-04-30 14:52:22.296 UTC [27] LOG: database system was shut down at 2023-04-30 14:52:21 UTC
2023-04-30 14:52:22.300 UTC [1] LOG: database system is ready to accept connections
我已经验证了svc选择器与pod标签匹配。
➜ anchore git:(main) ✗ k describe svc postgresql
Name: postgresql
Namespace: anchore
Labels: app=postgresql
app.kubernetes.io/managed-by=Helm
chart=postgresql-1.0.1
heritage=Helm
release=postgres
Annotations: meta.helm.sh/release-name: postgres
meta.helm.sh/release-namespace: anchore
Selector: app=postgresql,release=postgres
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 172.20.191.83
IPs: 172.20.191.83
Port: postgresql 5432/TCP
TargetPort: postgresql/TCP
Endpoints:
Session Affinity: None
Events: <none>
k describe pods postgres-postgresql-59468ff768-zhn6z
Name: postgres-postgresql-59468ff768-zhn6z
Namespace: anchore
Priority: 0
Service Account: default
Node: ip-10-1-0-223.us-gov-east-1.compute.internal/10.1.0.223
Start Time: Sun, 30 Apr 2023 09:52:21 -0500
Labels: app=postgresql
pod-template-hash=59468ff768
release=postgres
Annotations: <none>
Status: Running
IP: 10.1.0.95
IPs:
IP: 10.1.0.95
Controlled By: ReplicaSet/postgres-postgresql-59468ff768
Containers:
postgresql:
Container ID: containerd://4a76d4582bc4e443cd9dc93e578576f13de0194cc36ec1acff62e5e45dd0e070
Image: 247301905713.dkr.ecr.us-gov-east-1.amazonaws.com/postgres:14
Image ID: 247301905713.dkr.ecr.us-gov-east-1.amazonaws.com/postgres@sha256:db02f92063fb6083cb9dbf9d967ae0563d17d1e6332b6dfba6bdd7266c420ffa
Port: 5432/TCP
Host Port: 0/TCP
State: Running
Started: Sun, 30 Apr 2023 09:52:22 -0500
Ready: True
Restart Count: 0
我还想补充一点,我在一些pod中看到就绪/实时探测失败。
我已验证未使用任何网络策略。无IP表。没有安全组阻止通信。
类型原因年龄来自消息
警告BackOff 17 m(x5347 over 43 h)kubelet Backoff重启失败容器
Warning Unhealthy 7m26s (x13887 over 43h) kubelet Readiness probe failed: % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (7) Failed to connect to localhost port 8089: Connection refused
Warning Unhealthy 2m30s (x14341 over 43h) kubelet Readiness probe failed: Get "http://10.1.1.67:8668/health": dial tcp 10.1.1.67:8668: connect: connection refused
如果有人能给我指出正确的方向,我将不胜感激。我现在只学习了大约2个月的k8s,所以我可能在这里犯了一个明显的错误。让我知道如果任何其他输出将有助于在这里。
我试过了
- 验证NSLOOKUP是否适用于svc ip
- 重新启动部署、pod和svcs
- 已验证的AWS安全组和插件
- 检查的日志和事件
- 正在删除pod
1条答案
按热度按时间ruarlubt1#
此错误:
在我看来,
:5432
包含在主机名中。您尚未共享应用程序配置或此主机名的传入方式,但请确保主机名不包括端口。