Kubernetes尝试在旧节点上调度

igetnqfo  于 2023-10-17  发布在  Kubernetes
关注(0)|答案(1)|浏览(88)

我在Hetzner上有一个k3 s(v1.27.4+ k3 s1)Kubernetes集群(使用https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner部署),我用更好的硬件创建了新节点并将它们加入集群,然后我封锁,排空并关闭旧节点,但一些Pod仍然希望在旧节点上调度,即使它们不再在集群中(错误nodeinfo not found for node name“agent-cx 21-fsn 1-iof”)。我尝试删除HelmRelease,但当Flux重新创建它时,它仍然试图在旧节点上调度。不知道如何进一步诊断,任何提示?
Pod:

apiVersion: v1
kind: Pod
status:
  phase: Pending
  conditions:
    - type: PodScheduled
      status: 'False'
      lastProbeTime: null
      lastTransitionTime: '2023-09-05T13:47:11Z'
      reason: SchedulerError
      message: nodeinfo not found for node name "vpl-agent-cx21-fsn1-iof"
  qosClass: Burstable
spec:
  volumes:
    - name: config
      persistentVolumeClaim:
        claimName: config-pgadmin-0
    - name: kube-api-access-zk8jq
      projected:
        sources:
          - serviceAccountToken:
              expirationSeconds: 3607
              path: token
          - configMap:
              name: kube-root-ca.crt
              items:
                - key: ca.crt
                  path: ca.crt
          - downwardAPI:
              items:
                - path: namespace
                  fieldRef:
                    apiVersion: v1
                    fieldPath: metadata.namespace
        defaultMode: 420
  containers:
    - name: pgadmin
      image: docker.io/dpage/pgadmin4:7
      ports:
        - name: http
          containerPort: 8080
          protocol: TCP
      envFrom:
        - secretRef:
            name: pgadmin-secret
      env:
        - name: PGADMIN_LISTEN_PORT
          value: '8080'
      resources:
        limits:
          memory: 512Mi
        requests:
          cpu: 10m
          memory: 128Mi
      volumeMounts:
        - name: config
          mountPath: /var/lib/pgadmin
        - name: kube-api-access-zk8jq
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      livenessProbe:
        httpGet:
          path: /misc/ping
          port: 8080
          scheme: HTTP
        initialDelaySeconds: 3
        timeoutSeconds: 1
        periodSeconds: 10
        successThreshold: 1
        failureThreshold: 3
      readinessProbe:
        httpGet:
          path: /misc/ping
          port: 8080
          scheme: HTTP
        initialDelaySeconds: 3
        timeoutSeconds: 1
        periodSeconds: 10
        successThreshold: 1
        failureThreshold: 3
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: IfNotPresent
  restartPolicy: Always
  terminationGracePeriodSeconds: 30
  dnsPolicy: ClusterFirst
  serviceAccountName: default
  serviceAccount: default
  automountServiceAccountToken: true
  securityContext:
    runAsUser: 5050
    runAsGroup: 5050
    supplementalGroups:
      - 44
      - 109
      - 100
    fsGroup: 5050
    fsGroupChangePolicy: OnRootMismatch
  hostname: pgadmin-0
  subdomain: pgadmin
  schedulerName: default-scheduler
  tolerations:
    - key: node.kubernetes.io/not-ready
      operator: Exists
      effect: NoExecute
      tolerationSeconds: 300
    - key: node.kubernetes.io/unreachable
      operator: Exists
      effect: NoExecute
      tolerationSeconds: 300
  priority: 0
  enableServiceLinks: true
  preemptionPolicy: PreemptLowerPriority
hgtggwj0

hgtggwj01#

我有一个类似的问题(节点死亡),并注意到以前的部署有一些pvc(s)没有删除时,节点被删除,部署失败。删除PVC后,部署正确地出现在不同的节点上。

相关问题