Kubernetes Pod未连接到数据库服务

pkln4tw6  于 2023-05-06  发布在  Kubernetes
关注(0)|答案(1)|浏览(173)

我正在使用Amazon EKS运行Kubernetes 1.25集群。我使用Helm chart部署了Anchore应用程序。我修改了容器镜像,从我的AWS ECR仓库而不是Docker中提取。
查看其中一个pod的日志,我发现它正在尝试访问数据库服务,但无法解析。

(Background on this error at: https://sqlalche.me/e/14/e3q8)
[MainThread] 2023-04-30T00:06:41.155167 [anchore_enterprise_manager.util.db/connect_database()] [INFO] DB attempting to connect...
[MainThread] 2023-04-30T00:06:41.156165 [anchore_enterprise_manager.util.db/connect_database()] [WARN] DB connection failed, retrying - exception: test connection failed - exception: (psycopg2.OperationalError) could not translate host name "postgresql.anchore.svc.cluster.local:5432" to address: Name or service not known

这里是我的postgresql服务~ k获取服务postgres-postgresql名称类型CLUSTER-IP EXTERNAL-IP端口年龄postgres-postgresql ClusterIP172.20.191.835432/TCP 27 h
~ k get endpoints postgres-postgresql NAME ENDPOINTS AGE postgres-postgresql 10.1.0.74:5432 27h
postgres的pod日志里什么都没有。
我已经验证了AWS安全组是完全开放的,允许集群和节点之间的所有流量。已验证Core DNS是否正常工作。启动一个忙碌的盒子pod并解决了上述服务。

➜  anchore git:(main) ✗ k exec -it busybox-pod -- nslookup postgresql.anchore.svc.cluster.local
Server:     172.20.0.10
Address:    172.20.0.10:53

Name:   postgresql.anchore.svc.cluster.local
Address: 172.20.191.83

下面是来自postgresql pod的日志

k logs postgres-postgresql-59468ff768-zhn6z   
Defaulted container "postgresql" out of: postgresql, postgres-postgresql

PostgreSQL Database directory appears to contain a database; Skipping initialization

2023-04-30 14:52:22.289 UTC [1] LOG:  starting PostgreSQL 14.6 (Debian 14.6-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-04-30 14:52:22.289 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-04-30 14:52:22.289 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2023-04-30 14:52:22.292 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-04-30 14:52:22.296 UTC [27] LOG:  database system was shut down at 2023-04-30 14:52:21 UTC
2023-04-30 14:52:22.300 UTC [1] LOG:  database system is ready to accept connections

我已经验证了svc选择器与pod标签匹配。

➜  anchore git:(main) ✗ k describe svc  postgresql
Name:              postgresql
Namespace:         anchore
Labels:            app=postgresql
                   app.kubernetes.io/managed-by=Helm
                   chart=postgresql-1.0.1
                   heritage=Helm
                   release=postgres
Annotations:       meta.helm.sh/release-name: postgres
                   meta.helm.sh/release-namespace: anchore
Selector:          app=postgresql,release=postgres
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                172.20.191.83
IPs:               172.20.191.83
Port:              postgresql  5432/TCP
TargetPort:        postgresql/TCP
Endpoints:         
Session Affinity:  None
Events:            <none>
k describe pods postgres-postgresql-59468ff768-zhn6z 
Name:             postgres-postgresql-59468ff768-zhn6z
Namespace:        anchore
Priority:         0
Service Account:  default
Node:             ip-10-1-0-223.us-gov-east-1.compute.internal/10.1.0.223
Start Time:       Sun, 30 Apr 2023 09:52:21 -0500
Labels:           app=postgresql
                  pod-template-hash=59468ff768
                  release=postgres
Annotations:      <none>
Status:           Running
IP:               10.1.0.95
IPs:
  IP:           10.1.0.95
Controlled By:  ReplicaSet/postgres-postgresql-59468ff768
Containers:
  postgresql:
    Container ID:   containerd://4a76d4582bc4e443cd9dc93e578576f13de0194cc36ec1acff62e5e45dd0e070
    Image:          247301905713.dkr.ecr.us-gov-east-1.amazonaws.com/postgres:14
    Image ID:       247301905713.dkr.ecr.us-gov-east-1.amazonaws.com/postgres@sha256:db02f92063fb6083cb9dbf9d967ae0563d17d1e6332b6dfba6bdd7266c420ffa
    Port:           5432/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sun, 30 Apr 2023 09:52:22 -0500
    Ready:          True
    Restart Count:  0

我还想补充一点,我在一些pod中看到就绪/实时探测失败。
我已验证未使用任何网络策略。无IP表。没有安全组阻止通信。
类型原因年龄来自消息
警告BackOff 17 m(x5347 over 43 h)kubelet Backoff重启失败容器

Warning  Unhealthy  7m26s (x13887 over 43h)  kubelet  Readiness probe failed:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (7) Failed to connect to localhost port 8089: Connection refused
  Warning  Unhealthy  2m30s (x14341 over 43h)  kubelet  Readiness probe failed: Get "http://10.1.1.67:8668/health": dial tcp 10.1.1.67:8668: connect: connection refused

如果有人能给我指出正确的方向,我将不胜感激。我现在只学习了大约2个月的k8s,所以我可能在这里犯了一个明显的错误。让我知道如果任何其他输出将有助于在这里。
我试过了

  • 验证NSLOOKUP是否适用于svc ip
  • 重新启动部署、pod和svcs
  • 已验证的AWS安全组和插件
  • 检查的日志和事件
  • 正在删除pod
ruarlubt

ruarlubt1#

此错误:

could not translate host name "postgresql.anchore.svc.cluster.local:5432" to address: Name or service not known

在我看来,:5432包含在主机名中。您尚未共享应用程序配置或此主机名的传入方式,但请确保主机名不包括端口。

相关问题