我在Hetzner上有一个k3 s(v1.27.4+ k3 s1)Kubernetes集群(使用https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner部署),我用更好的硬件创建了新节点并将它们加入集群,然后我封锁,排空并关闭旧节点,但一些Pod仍然希望在旧节点上调度,即使它们不再在集群中(错误nodeinfo not found for node name“agent-cx 21-fsn 1-iof”)。我尝试删除HelmRelease,但当Flux重新创建它时,它仍然试图在旧节点上调度。不知道如何进一步诊断,任何提示?
Pod:
apiVersion: v1
kind: Pod
status:
phase: Pending
conditions:
- type: PodScheduled
status: 'False'
lastProbeTime: null
lastTransitionTime: '2023-09-05T13:47:11Z'
reason: SchedulerError
message: nodeinfo not found for node name "vpl-agent-cx21-fsn1-iof"
qosClass: Burstable
spec:
volumes:
- name: config
persistentVolumeClaim:
claimName: config-pgadmin-0
- name: kube-api-access-zk8jq
projected:
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
name: kube-root-ca.crt
items:
- key: ca.crt
path: ca.crt
- downwardAPI:
items:
- path: namespace
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
defaultMode: 420
containers:
- name: pgadmin
image: docker.io/dpage/pgadmin4:7
ports:
- name: http
containerPort: 8080
protocol: TCP
envFrom:
- secretRef:
name: pgadmin-secret
env:
- name: PGADMIN_LISTEN_PORT
value: '8080'
resources:
limits:
memory: 512Mi
requests:
cpu: 10m
memory: 128Mi
volumeMounts:
- name: config
mountPath: /var/lib/pgadmin
- name: kube-api-access-zk8jq
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
livenessProbe:
httpGet:
path: /misc/ping
port: 8080
scheme: HTTP
initialDelaySeconds: 3
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
readinessProbe:
httpGet:
path: /misc/ping
port: 8080
scheme: HTTP
initialDelaySeconds: 3
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
serviceAccountName: default
serviceAccount: default
automountServiceAccountToken: true
securityContext:
runAsUser: 5050
runAsGroup: 5050
supplementalGroups:
- 44
- 109
- 100
fsGroup: 5050
fsGroupChangePolicy: OnRootMismatch
hostname: pgadmin-0
subdomain: pgadmin
schedulerName: default-scheduler
tolerations:
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
tolerationSeconds: 300
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 300
priority: 0
enableServiceLinks: true
preemptionPolicy: PreemptLowerPriority
1条答案
按热度按时间hgtggwj01#
我有一个类似的问题(节点死亡),并注意到以前的部署有一些pvc(s)没有删除时,节点被删除,部署失败。删除PVC后,部署正确地出现在不同的节点上。