Kubernetes上的Nginx间歇性超时

zysjyyx4  于 2023-02-11  发布在  Kubernetes
关注(0)|答案(1)|浏览(257)

我不知道是什么在制造麻烦。
我的设置:

  • 具有一个主节点和一个工作节点的Kubernetes(v1.26)群集,在虚拟机上自行部署
  • Nginx反向代理(当前在主服务器上)
  • 基本FastAPI pod,带有部署、服务和入口yaml(以下)

我在另一家云提供商上部署了完全相同的环境,一点也不麻烦。
在这里,一切都工作正常的时刻,API是通过浏览器访问,然后它失败了504网关超时错误.重新启动Nginx pod修复了一个未知的时间再次问题.我目睹了连接失败,并再次工作了几分钟apparts,在写它的时候已经一个小时,因为它的工作正常没有中断.
下面是成功请求和超时之间的nginx日志:

X.X.X.X - - [09/Feb/2023:12:30:18 +0000] "GET /docs HTTP/1.1" 200 952 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:108.0) Gecko/20100101 Firefox/108.0" 373 0.019 [my-app-8005] [] 172.16.180.6:8005 952 0.019 200 22cd1b13ef2dcbf4b1be2983649f658c
X.X.X.X  - - [09/Feb/2023:12:30:19 +0000] "GET /openapi.json HTTP/1.1" 200 5868 "http://xxxx/docs" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:108.0) Gecko/20100101 Firefox/108.0" 323 0.003 [my-app-8005] [] 172.16.180.6:8005 5868 0.003 200 46551c8481d446ec69de2399f49b7f86
I0209 12:31:13.983933       7 queue.go:87] "queuing" item="&ObjectMeta{Name:sync status,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[]OwnerReference{},Finalizers:[],ManagedFields:[]ManagedFieldsEntry{},}"
I0209 12:31:13.984018       7 queue.go:128] "syncing" key="&ObjectMeta{Name:sync status,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[]OwnerReference{},Finalizers:[],ManagedFields:[]ManagedFieldsEntry{},}"
I0209 12:31:13.990418       7 status.go:275] "skipping update of Ingress (no change)" namespace="namespace" ingress="app-ingress-xxxx"
I0209 12:32:13.983857       7 queue.go:87] "queuing" item="&ObjectMeta{Name:sync status,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[]OwnerReference{},Finalizers:[],ManagedFields:[]ManagedFieldsEntry{},}"
I0209 12:32:13.983939       7 queue.go:128] "syncing" key="&ObjectMeta{Name:sync status,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[]OwnerReference{},Finalizers:[],ManagedFields:[]ManagedFieldsEntry{},}"
I0209 12:32:13.990895       7 status.go:275] "skipping update of Ingress (no change)" namespace="namespace" ingress="app-ingress-xxxx"
2023/02/09 12:32:59 [error] 30#30: *4409 upstream timed out (110: Operation timed out) while connecting to upstream, client: X.X.X.X , server: xxxx, request: "GET /docs HTTP/1.1", upstream: "http://172.16.180.6:8005/docs", host: "xxxx"
2023/02/09 12:33:04 [error] 30#30: *4409 upstream timed out (110: Operation timed out) while connecting to upstream, client: X.X.X.X , server: xxxx, request: "GET /docs HTTP/1.1", upstream: "http://172.16.180.6:8005/docs", host: "xxxx"
2023/02/09 12:33:09 [error] 30#30: *4409 upstream timed out (110: Operation timed out) while connecting to upstream, client: X.X.X.X , server: xxxx, request: "GET /docs HTTP/1.1", upstream: "http://172.16.180.6:8005/docs", host: "xxxx"
X.X.X.X  - - [09/Feb/2023:12:33:09 +0000] "GET /docs HTTP/1.1" 504 160 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:108.0) Gecko/20100101 Firefox/108.0" 373 15.004 [my-app-8005] [] 172.16.180.6:8005, 172.16.180.6:8005, 172.16.180.6:8005 0, 0, 0 5.001, 5.001, 5.001 504, 504, 504 56fb622d8d89d8d7b3cdbc4a094215c3

Yaml配置文件:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress-xxxx
spec:
  ingressClassName: nginx
  rules:
  - host: xxxx
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-app
            port: 
              number: 8005
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  namespace: namespace
spec:
  progressDeadlineSeconds: 3600
  replicas: 1
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
     containers:
      - name: backend
        image: xxxx
        imagePullPolicy: Always
        ports:
        - containerPort: 8005
     imagePullSecrets:
     - name: xxxx

---
apiVersion: v1
kind: Service
metadata:
  name: my-app
  namespace: namespace
  labels:
    app: my-app
spec:
  type: NodePort
  ports:
  - nodePort: 30008
    port: 8005
    protocol: TCP
  selector:
    app: my-app

我更改了在此发布的应用程序和IP。
我坚持认为,在通过nginx查询时超时期间,我仍然可以使用主服务器上的worker-ip:nodePort地址和ssh访问它,并使用ClusterIP curl fastapi pod。
我的第一个猜测是内存问题,尽管服务器上现在没有运行任何其他东西。我刚刚安装了kubernetes metrics API,目前正在等待再次停机,目前为止没有问题。
这种行为的原因可能是什么?感谢您对进一步检查的内容提出任何建议!

yeotifhr

yeotifhr1#

如果您收到504网关超时错误,您的系统可能资源不足。请增加您的环境资源以解决此问题。504错误意味着nginx等待响应的时间过长,并且已经超时。此外,您需要将入口注解添加到yaml配置文件中。默认情况下,proxy_read_timeout为60 s;

Syntax : proxy_read_timeout time;
Default: proxy_read_timeout 60s;
Context: http, server, location

定义从代理服务器阅读响应的超时。仅在两个连续读取操作之间设置超时,而不是为传输整个响应设置超时。如果代理服务器在此时间内未传输任何内容,则连接将关闭。有关详细信息,请参阅文档

相关问题