kubernetes 无法将具有PV和PVC的备份恢复到不同区域的群集[AWS EKS]

ejk8hzay  于 2023-04-20  发布在  Kubernetes
关注(0)|答案(1)|浏览(223)

您好,我们正在DR规划中使用velero,我们正在制定跨区域备份恢复策略,我们正在备份工作负载、PV和PVC我们在将备份从(US-EAST-2)恢复到第二个区域(US-West-2)时遇到问题。
使用以下命令在两个群集上顺利完成安装

velero install \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.4.0 \
    --bucket velerobucket\
    --backup-location-config region=us-east-2 \
    --snapshot-location-config region=us-east-2 \
    --secret-file secret-file

备份创建也会顺利完成,不会出现任何错误

velero backup create zookeeperbkp --include-namespaces zookeeper --snapshot-volumes

从us-east-2在us-west-2集群上执行恢复时,恢复成功完成,velero恢复日志中没有任何错误,但zookeeper pod进入挂起状态

velero restore create  --from-backup zookeeperbkp

kubectl get pods -n zookeeper
NAME          READY   STATUS    RESTARTS   AGE
zookeeper-0   0/2     Pending   0          3m24s
zookeeper-1   0/2     Pending   0          3m24s
zookeeper-2   0/2     Pending   0          3m24s

在描述了它抱怨的豆荚之后

0/1 nodes are available: 1 node(s) had volume node affinity conflict.

描述PV后,似乎试图在us-east-2中创建PV,标签为us-east-2,而它应该是us-west-2(恢复群集)
在所有这些之后,我读到了更多关于velero在跨区域集群中恢复PV和PVC的限制。
我试着做同样的事情,通过修改s3中的velero快照json文件。

aws s3 cp s3://velerobkpxyz/backups/zookeeper/ ./ --recursive
gunzip zookeeper-volumesnapshots.json.gz
sed -i "s/us-east-2/us-west-2/g" zookeeper-volumesnapshots.json
s3 cp zookeeper-volumesnapshots.json.gz s3://velerobkp/backups/zookeeper/zookeeper-volumesnapshots.json.gz

同样地,我对zookeeper.tar.gz做了更改

mkdir zookeeper-temp
tar xzf zookeeper.tar.gz -C zookeeper-temp/
cd zookeeper-temp/
find . -name \*.json -exec sh -c "sed -i 's/us-east-2/us-west-2/g' {}" \;
tar czf ../zookeeper.tar.gz *
aws s3 cp zookeeper.tar.gz s3://velerobkp/backups/zookeeper/

在此之后,备份时所描述的拿出正确的区域名称的PV的

velero backup describe zookeeper --details

    Name:         zookeeper9
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.21.5-eks-bc4871b
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=21+

Phase:  Completed

Errors:    0
Warnings:  0

Namespaces:
  Included:  zookeeper
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  true

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2022-03-30 20:37:53 +0530 IST
Completed:  2022-03-30 20:37:57 +0530 IST

Expiration:  2022-04-29 20:37:53 +0530 IST

Total items to be backed up:  52
Items backed up:              52

Resource List:
  apiextensions.k8s.io/v1/CustomResourceDefinition:
    - servicemonitors.monitoring.coreos.com
  apps/v1/ControllerRevision:
    - zookeeper/zookeeper-596cddb599
    - zookeeper/zookeeper-5977bdccb6
    - zookeeper/zookeeper-5cd569cbf9
    - zookeeper/zookeeper-6585c9bc89
    - zookeeper/zookeeper-6bf55cfd99
    - zookeeper/zookeeper-856646d9f6
    - zookeeper/zookeeper-8cdd5f46
    - zookeeper/zookeeper-ccf87988c
  apps/v1/StatefulSet:
    - zookeeper/zookeeper
  discovery.k8s.io/v1/EndpointSlice:
    - zookeeper/zookeeper-headless-2tnx5
    - zookeeper/zookeeper-mzdlc
  monitoring.coreos.com/v1/ServiceMonitor:
    - zookeeper/zookeeper-exporter
  policy/v1/PodDisruptionBudget:
    - zookeeper/zookeeper
  v1/ConfigMap:
    - zookeeper/kube-root-ca.crt
    - zookeeper/zookeeper
  v1/Endpoints:
    - zookeeper/zookeeper
    - zookeeper/zookeeper-headless
  v1/Namespace:
    - zookeeper
  v1/PersistentVolume:
    - pvc-261b9803-8e55-4880-bb31-b29ca3a6c323
    - pvc-89cfd5b9-65da-4fd1-a095-83d21d1d21db
    - pvc-9e027e4c-cc9e-11ea-9ce3-061b42a2865e
    - pvc-a835d78d-9dfd-41f7-92bd-7f2e752dbeb7
    - pvc-c0e454f7-cc9e-11ea-9ce3-061b42a2865e
    - pvc-ee6aad46-cc9e-11ea-9ce3-061b42a2865e
  v1/PersistentVolumeClaim:
    - zookeeper/data-zookeeper-0
    - zookeeper/data-zookeeper-1
    - zookeeper/data-zookeeper-2
    - zookeeper/data-zookeeper-3
    - zookeeper/data-zookeeper-4
    - zookeeper/data-zookeeper-5
  v1/Pod:
    - zookeeper/zookeeper-0
    - zookeeper/zookeeper-1
    - zookeeper/zookeeper-2
    - zookeeper/zookeeper-3
    - zookeeper/zookeeper-4
    - zookeeper/zookeeper-5
  v1/Secret:
    - zookeeper/default-token-kcl4m
    - zookeeper/sh.helm.release.v1.zookeeper.v1
    - zookeeper/sh.helm.release.v1.zookeeper.v10
    - zookeeper/sh.helm.release.v1.zookeeper.v11
    - zookeeper/sh.helm.release.v1.zookeeper.v12
    - zookeeper/sh.helm.release.v1.zookeeper.v13
    - zookeeper/sh.helm.release.v1.zookeeper.v4
    - zookeeper/sh.helm.release.v1.zookeeper.v5
    - zookeeper/sh.helm.release.v1.zookeeper.v6
    - zookeeper/sh.helm.release.v1.zookeeper.v7
    - zookeeper/sh.helm.release.v1.zookeeper.v8
    - zookeeper/sh.helm.release.v1.zookeeper.v9
  v1/Service:
    - zookeeper/zookeeper
    - zookeeper/zookeeper-headless
  v1/ServiceAccount:
    - zookeeper/default

Velero-Native Snapshots:
  pvc-9e027e4c-cc9e-11ea-9ce3-061b42a2865e:
    Snapshot ID:        snap-0f81f2f62e476584a
    Type:               gp2
    Availability Zone:  us-west-2b
    IOPS:               <N/A>
  pvc-c0e454f7-cc9e-11ea-9ce3-061b42a2865e:
    Snapshot ID:        snap-0c689771f3dbfa361
    Type:               gp2
    Availability Zone:  us-west-2a
    IOPS:               <N/A>
  pvc-ee6aad46-cc9e-11ea-9ce3-061b42a2865e:
    Snapshot ID:        snap-068c63f1bb31af3cc
    Type:               gp2
    Availability Zone:  us-west-2b
    IOPS:               <N/A>
  pvc-89cfd5b9-65da-4fd1-a095-83d21d1d21db:
    Snapshot ID:        snap-050e2e51eac92bd74
    Type:               gp2
    Availability Zone:  us-west-2a
    IOPS:               <N/A>
  pvc-261b9803-8e55-4880-bb31-b29ca3a6c323:
    Snapshot ID:        snap-08e45396c99e7aac3
    Type:               gp2
    Availability Zone:  us-west-2b
    IOPS:               <N/A>
  pvc-a835d78d-9dfd-41f7-92bd-7f2e752dbeb7:
    Snapshot ID:        snap-07ad93657b0bdc1a6
    Type:               gp2
    Availability Zone:  us-west-2a
    IOPS:               <N/A>

但当试图恢复失败时
velero restore create --from-backup zookeeper

velero restore describe zookeeper9-20220331145320
Name:         zookeeper9-20220331145320
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Phase:                       PartiallyFailed (run 'velero restore logs zookeeper9-20220331145320' for more information)
Total items to be restored:  52
Items restored:              52

Started:    2022-03-31 14:53:24 +0530 IST
Completed:  2022-03-31 14:53:36 +0530 IST

Warnings:
  Velero:     <none>
  Cluster:    <none>
  Namespaces:
    zookeeper:  could not restore, ConfigMap "kube-root-ca.crt" already exists. Warning: the in-cluster version is different than the backed-up version.

Errors:
  Velero:     <none>
  Cluster:  error executing PVAction for persistentvolumes/pvc-261b9803-8e55-4880-bb31-b29ca3a6c323: rpc error: code = Unknown desc = InvalidZone.NotFound: The zone 'us-west-2b' does not exist.
  status code: 400, request id: 2b5ae55c-dfd5-4c52-8494-105e46bce78b
    error executing PVAction for persistentvolumes/pvc-89cfd5b9-65da-4fd1-a095-83d21d1d21db: rpc error: code = Unknown desc = InvalidZone.NotFound: The zone 'us-west-2a' does not exist.
  status code: 400, request id: ed91b698-d3b9-450f-b7b4-a3869cbae6ae
    error executing PVAction for persistentvolumes/pvc-9e027e4c-cc9e-11ea-9ce3-061b42a2865e: rpc error: code = Unknown desc = InvalidZone.NotFound: The zone 'us-west-2b' does not exist.
  status code: 400, request id: 2b493106-84c6-4210-9663-4d00f47c06de
    error executing PVAction for persistentvolumes/pvc-a835d78d-9dfd-41f7-92bd-7f2e752dbeb7: rpc error: code = Unknown desc = InvalidZone.NotFound: The zone 'us-west-2a' does not exist.
  status code: 400, request id: 387c6c27-6b18-4bc6-9bb8-3ed152cb49d1
    error executing PVAction for persistentvolumes/pvc-c0e454f7-cc9e-11ea-9ce3-061b42a2865e: rpc error: code = Unknown desc = InvalidZone.NotFound: The zone 'us-west-2a' does not exist.
  status code: 400, request id: 7d7d2931-e7d9-4bc5-8cb1-20e3b2849fe2
    error executing PVAction for persistentvolumes/pvc-ee6aad46-cc9e-11ea-9ce3-061b42a2865e: rpc error: code = Unknown desc = InvalidZone.NotFound: The zone 'us-west-2b' does not exist.
  status code: 400, request id: 75648031-97ca-4e2a-a079-8f6618902b2a
  Namespaces: <none>

Backup:  zookeeper9

Namespaces:
  Included:  all namespaces found in the backup
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  Cluster-scoped:  auto

Namespace mappings:  <none>

Label selector:  <none>

Restore PVs:  auto

Preserve Service NodePorts:  auto

它抱怨

Cluster:  error executing PVAction for persistentvolumes/pvc-261b9803-8e55-4880-bb31-b29ca3a6c323: rpc error: code = Unknown desc = InvalidZone.NotFound: The zone 'us-west-2b' does not exist.

状态代码:400,请求id:2b5ae55c-dfd5-4c52-8494-105e46bce78b
我不知道为什么会这样,有没有什么我错过了。
这使我想到是否也需要对快照执行某些操作,因为备份的快照ID位于源区域中,而不可用于目标区域

vojdkbi0

vojdkbi01#

由于目前velero中没有对此的支持,因此不得不通过变通方案解决此问题。
感谢jglick提交的PR,该PR在velero repo和velero plugin for aws中针对此功能提出
在从这2个仓库采购图像后,我能够将PV和PVC复制到不同的区域。
如上所述,这不是一个核心解决方案,因为它没有合并,我建议不要直接在Prod中使用它,并进行彻底的测试。这也是由本PR的贡献者建议的。
请仔细阅读此问题的讨论和步骤https://github.com/vmware-tanzu/velero-plugin-for-aws/pull/90
步骤1:从这2个PR存储库https://github.com/jglick/velero/tree/concurrent-snapshot中获取图像
https://github.com/jglick/velero-plugin-for-aws/tree/x-region
步骤,您可以将其替换为您选择的相应存储库

AWS ECR使用步骤

1.创建velero和velero-plugin-for-aws仓库

exaws ecr create-repository --repository-name testing/velero --region $region || echo already exists
为velero-plugin-for-aws创建仓库
exaws ecr create-repository --repository-name testing/velero-plugin-for-aws --region $region || echo already exists

2.为velero创建容器

command
make -C /path/to/velero REGISTRY=$registry/testing VERSION=testing container
exmake -C . REGISTRY=123456789.dkr.ecr.us-west-2.amazonaws.com/testing version=0.1 container

3.创建velero-plugin-for-aws容器

command
docker build -t $registry/testing/velero-plugin-for-aws /path/to/patched/velero-plugin-for-aws
exdocker build -t 123456789.dkr.ecr.us-west-2.amazonaws.com/testing/velero-plugin-for-aws velero-plugin-for-aws

4.现在登录需要推送镜像的地域的AWS ECR

command
aws ecr get-login-password --region $region | docker login --username AWS --password-stdin $registry
exaws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 123456789.dkr.ecr.us-west-2.amazonaws.com

5.将velero和velero-plugin-for-aws镜像推送到仓库

command
docker push $registry/testing/velero
docker push $registry/testing/velero-plugin-for-aws
ex:docker push 123456789.dkr.ecr.us-west-2.amazonaws.com/testing/velero:maindocker push 123456789.dkr.ecr.us-west-2.amazonaws.com/testing/velero-plugin-for-aws
现在,您的映像将被推送到存储库,并可用于在您希望的任何区域中创建备份和恢复

现在将velero安装在你要备份的区域和另一个你要恢复的区域

使用当前的regionalt_region创建一个values文件,因此当备份在当前区域中发生时,对于具有PV的statefulsets,卷将被复制到您指定的备用区域。
下面是一个示例,其中我们将us-east-2设置为源区域,将us-west-2设置为备用区域
ex

cat /tmp/velero-us-east-2.yaml
image:
  repository: 123456789.dkr.ecr.us-west-2.amazonaws.com/testing/velero
  tag: main
initContainers:
- name: velero-plugin-for-aws
  image: 123456789.dkr.ecr.us-west-2.amazonaws.com/testing/velero-plugin-for-aws:latest
  volumeMounts:
  - mountPath: /target
    name: plugins
configuration:
  provider: aws
  backupStorageLocation:
    bucket: <your-bucket-name>
    config:
      region: us-east-2
  volumeSnapshotLocation:
    config:
      region: us-east-2
      altRegion: us-west-2
  extraEnvVars:
    AWS_CLUSTER_NAME: <your-EKS-Cluster-name>
    VELERO_AWS_AZ_OVERRIDE: us-east-2a
serviceAccount:
  server:
    create: true
    name: velero
credentials:
  useSecret: true
  secretContents:
    cloud: |
      [default]
      aws_access_key_id=<velerro-user-creds>
      aws_secret_access_key=<velero-user-creds>

因此,在这种情况下,当在us-east-2中进行备份时,快照将复制到us-west-2区域
使用helmus-east-2源区域中安装velero
helm install velero vmware-tanzu/velero --version 2.24.0 --namespace velero --create-namespace -f /tmp/velero.yaml

类似地,这需要在我们需要恢复备份的区域中进行配置,在我们的案例中为us-west-2

Ex:

cat /tmp/velero-restore-us-west-2.yaml

image:
  repository: 123456789.dkr.ecr.us-west-2.amazonaws.com/testing/velero
  tag: main
initContainers:
- name: velero-plugin-for-aws
  image: 123456789.dkr.ecr.us-west-2.amazonaws.com/testing/velero-plugin-for-aws:latest
  volumeMounts:
  - mountPath: /target
    name: plugins
configuration:
  provider: aws
  backupStorageLocation:
    bucket: velerobkptest
    config:
      region: us-east-2
  volumeSnapshotLocation:
    config:
      region: us-west-2
      altRegion: us-west-2
  extraEnvVars:
    AWS_CLUSTER_NAME: <your-cluster-name in current region>
    VELERO_AWS_AZ_OVERRIDE: us-west-2a
serviceAccount:
  server:
    create: true
    name: velero
credentials:
  useSecret: true
  secretContents:
    cloud: |
      [default]
      aws_access_key_id=<velero-user-creds>
      aws_secret_access_key=<velero-user-creds>

做一个 Helm 安装helm install velero vmware-tanzu/velero --version 2.24.0 --namespace velero --create-namespace -f /tmp/velero-restore-us-west-2.yaml

现在检查velero备份位置是否配置正确

velero get backup-location
NAME      PROVIDER   BUCKET/PREFIX   PHASE       LAST VALIDATED                  ACCESS MODE   DEFAULT
default   aws        velerobkptest   Available   2023-02-20 00:21:44 +0530 IST   ReadWrite     true

两个群集都设置好后,我们可以相应地在us-east-2和us-west-2上运行备份和恢复命令

velero backup create zookeeper-z --include-namespaces zookeeper

使用查看状态

velero describe backup zookeeper-z --details

从us-east-2区域恢复到us-west-2

velero restore create --from-backup zookeeper-z

恢复应成功,并且Pod应连接到其所需卷运行

kubectl get pods -n zookeeper
NAME          READY   STATUS    RESTARTS   AGE
zookeeper-0   2/2     Running   0          17h
zookeeper-1   2/2     Running   1          17h
zookeeper-2   2/2     Running   1          17h
zookeeper-3   2/2     Running   1          17h
zookeeper-4   2/2     Running   0          17h
zookeeper-5   2/2     Running   0          17h

我们假设您已经运行了为velero和S3存储桶创建iam用户的所有步骤
README用于velero IAM用户和S3存储桶配置

相关问题