有人成功地使用azurerm_virtual_machine_extension启用了虚拟机诊断吗?

fae0ux8s  于 2023-01-21  发布在  Mac
关注(0)|答案(4)|浏览(174)

在Azure中启用虚拟机诊断是一件很痛苦的事情。我已经使用ARM模板、Azure PowerShell SDK和Azure CLI让它工作了。但是我已经尝试了几天,现在我已经使用Terraform和azurerm_virtual_machine_extension资源为Windows和Linux虚拟机启用虚拟机诊断。仍然没有工作,唉!
以下是我目前所做的(为了简化这篇文章,我做了一些调整,希望我的手动编辑没有破坏任何东西):

resource "azurerm_virtual_machine_extension" "vm-linux" {
  count                      = "${local.is_windows_vm == "false" ? 1 : 0}"
  depends_on                 = ["azurerm_virtual_machine_data_disk_attachment.vm"]
  name                       = "LinuxDiagnostic"
  location                   = "${var.location}"
  resource_group_name        = "${var.resource_group_name}"
  virtual_machine_name       = "${local.vm_name}"
  publisher                  = "Microsoft.Azure.Diagnostics"
  type                       = "LinuxDiagnostic"
  type_handler_version       = "3.0"
  auto_upgrade_minor_version = "true"

  # The JSON file referenced below was created by running "az vm diagnostics get-default-config", and adding/verifying the "__DIAGNOSTIC_STORAGE_ACCOUNT__" and "__VM_RESOURCE_ID__" placeholders.
  settings = <<SETTINGS
    {
      "ladCfg": "${base64encode(replace(replace(file("${path.module}/.diag-settings/linux_diag_config.json"), "__DIAGNOSTIC_STORAGE_ACCOUNT__", "${module.vm_storage_account.name}"), "__VM_RESOURCE_ID__", "${local.metricsresourceid}"))}",
      "storageAccount": "${module.vm_storage_account.name}"
    }
SETTINGS

  # SAS token below: Do not include the leading question mark, as per https://learn.microsoft.com/en-us/azure/virtual-machines/extensions/diagnostics-linux.
  protected_settings = <<SETTINGS
    {
      "storageAccountName": "${module.vm_storage_account.name}",
      "storageAccountSasToken": "${replace(data.azurerm_storage_account_sas.current.sas, "/^\\?/", "")}",
      "storageAccountEndPoint": "https://core.windows.net/"
    }
SETTINGS
}

resource "azurerm_virtual_machine_extension" "vm-win" {
  count                      = "${local.is_windows_vm == "true" ? 1 : 0}"
  depends_on                 = ["azurerm_virtual_machine_data_disk_attachment.vm"]
  name                       = "Microsoft.Insights.VMDiagnosticsSettings"
  location                   = "${var.location}"
  resource_group_name        = "${var.resource_group_name}"
  virtual_machine_name       = "${local.vm_name}"
  publisher                  = "Microsoft.Azure.Diagnostics"
  type                       = "IaaSDiagnostics"
  type_handler_version       = "1.9"
  auto_upgrade_minor_version = "true"

  # The JSON file referenced below was created by running "az vm diagnostics get-default-config --is-windows-os", and adding/verifying the "__DIAGNOSTIC_STORAGE_ACCOUNT__" and "__VM_RESOURCE_ID__" placeholders.
  settings = <<SETTINGS
    {
      "wadCfg": "${base64encode(replace(replace(file("${path.module}/.diag-settings/windows_diag_config.json"), "__DIAGNOSTIC_STORAGE_ACCOUNT__", "${module.vm_storage_account.name}"), "__VM_RESOURCE_ID__", "${local.metricsresourceid}"))}",
      "storageAccount": "${module.vm_storage_account.name}"
    }
SETTINGS

  protected_settings = <<SETTINGS
    {
      "storageAccountName": "${module.vm_storage_account.name}",
      "storageAccountSasToken": "${data.azurerm_storage_account_sas.current.sas}",
      "storageAccountEndPoint": "https://core.windows.net/"
    }
SETTINGS
}

请注意,对于Linux和Windows,我都是根据注解从代码库中的JSON文件加载诊断详细信息的,这些是Azure提供的默认配置,因此它们应该是有效的。
当我部署这些时,Linux VM扩展部署成功,但在Azure门户中,扩展显示"在生成的mdsd配置中检测到问题"。如果我查看VM的"诊断设置",它显示"遇到错误:TypeError:对象不支持属性或方法"diagnosticMonitorConfiguration'"。Windows VM扩展完全无法部署,说明它"无法读取配置"。如果我在门户中查看扩展,它会显示以下错误:

"code": "ComponentStatus//failed/-3",
"level": "Error",
"displayStatus": "Provisioning failed",
"message": "Error starting the diagnostics extension"

如果我看看"诊断设置"窗格,它只是挂着一个永无止境的"..."动画。
但是,如果我查看两个VM扩展的"terraform apply"输出,解码后的设置看起来完全符合预期,与配置文件匹配,占位符被正确替换。
有什么建议可以让它工作吗?
先谢了!

mrfwxfqh

mrfwxfqh1#

到目前为止,我已经让Windows诊断程序在我们的环境中100%正常工作。看起来AzureRM API对发送的配置非常挑剔。我们一直在使用powershell来启用它,而powershell中使用的相同xmlCfg不适用于terraform。到目前为止,这对我们来说是有效的:(settings/protected_settings名称区分大小写!aka xmlCfg有效,而xmlcfg无效)
main.cf

#########################################################
#  VM Extensions - Windows In-Guest Monitoring/Diagnostics
#########################################################
resource "azurerm_virtual_machine_extension" "InGuestDiagnostics" {
  name                       = var.compute["InGuestDiagnostics"]["name"]
  location                   = azurerm_resource_group.VMResourceGroup.location
  resource_group_name        = azurerm_resource_group.VMResourceGroup.name
  virtual_machine_name       = azurerm_virtual_machine.Compute.name
  publisher                  = var.compute["InGuestDiagnostics"]["publisher"]
  type                       = var.compute["InGuestDiagnostics"]["type"]
  type_handler_version       = var.compute["InGuestDiagnostics"]["type_handler_version"]
  auto_upgrade_minor_version = var.compute["InGuestDiagnostics"]["auto_upgrade_minor_version"]

  settings           = <<SETTINGS
    {
      "xmlCfg": "${base64encode(templatefile("${path.module}/templates/wadcfgxml.tmpl", { vmid = azurerm_virtual_machine.Compute.id }))}",
      "storageAccount": "${data.azurerm_storage_account.InGuestDiagStorageAccount.name}"
    }
SETTINGS
  protected_settings = <<PROTECTEDSETTINGS
    {
      "storageAccountName": "${data.azurerm_storage_account.InGuestDiagStorageAccount.name}",
      "storageAccountKey": "${data.azurerm_storage_account.InGuestDiagStorageAccount.primary_access_key}",
      "storageAccountEndPoint": "https://core.windows.net"
    }
PROTECTEDSETTINGS
}

特夫瓦尔斯

InGuestDiagnostics = {
    name                       = "WindowsDiagnostics"
    publisher                  = "Microsoft.Azure.Diagnostics"
    type                       = "IaaSDiagnostics"
    type_handler_version       = "1.16"
    auto_upgrade_minor_version = "true"
  }

wadcfgxml.tmpl(为了简洁起见,我删除了一些Perf计数器)

<WadCfg>
    <DiagnosticMonitorConfiguration overallQuotaInMB="5120">
        <DiagnosticInfrastructureLogs scheduledTransferLogLevelFilter="Error"/>
        <Metrics resourceId="${vmid}">
            <MetricAggregation scheduledTransferPeriod="PT1H"/>
            <MetricAggregation scheduledTransferPeriod="PT1M"/>
        </Metrics>
        <PerformanceCounters scheduledTransferPeriod="PT1M">
            <PerformanceCounterConfiguration counterSpecifier="\Processor Information(_Total)\% Processor Time" sampleRate="PT60S" unit="Percent" />
            <PerformanceCounterConfiguration counterSpecifier="\Processor Information(_Total)\% Privileged Time" sampleRate="PT60S" unit="Percent" />
            <PerformanceCounterConfiguration counterSpecifier="\Processor Information(_Total)\% User Time" sampleRate="PT60S" unit="Percent" />
            <PerformanceCounterConfiguration counterSpecifier="\Processor Information(_Total)\Processor Frequency" sampleRate="PT60S" unit="Count" />
            <PerformanceCounterConfiguration counterSpecifier="\System\Processes" sampleRate="PT60S" unit="Count" />
            <PerformanceCounterConfiguration counterSpecifier="\SQLServer:SQL Statistics\SQL Re-Compilations/sec" sampleRate="PT60S" unit="Count" />
        </PerformanceCounters>

        <WindowsEventLog scheduledTransferPeriod="PT1M">
            <DataSource name="Application!*[System[(Level = 1 or Level = 2)]]"/>
            <DataSource name="Security!*[System[(Level = 1 or Level = 2)]"/>
            <DataSource name="System!*[System[(Level = 1 or Level = 2)]]"/>
        </WindowsEventLog>
    </DiagnosticMonitorConfiguration>
</WadCfg>

我终于让Linux In-Guest Diagnostics工作了(LAD)。一些值得注意的事实,与Windows诊断不同,设置需要在json中传输,没有base64编码。此外,LAD似乎需要一个SAS令牌与存储帐户。关于AzureRM API的正常警告是挑剔的配置,以及设置是大小写敏感仍然存在。以下是我到目前为止的工作。

# Locals
locals {
  env                  = var.workspace[terraform.workspace]
  # Use a set/static time to avoid TF from recreating the SAS token every apply, which would then cause it to
  # modify/recreate anything that uses it. Not ideal, but the token is for a VERY long time, so it will do for now
  sas_begintime = "2019-11-22T00:00:00Z"
  sas_endtime = timeadd(local.sas_begintime, "873600h")
}

#########################################################
#  VM Extensions - In-Guest Diagnostics
#########################################################
# We need a SAS token for the In-Guest Metrics
data "azurerm_storage_account_sas" "inguestdiagnostics" {
  count             = (contains(keys(local.env), "InGuestDiagnostics") ? 1 : 0)
  connection_string = data.azurerm_storage_account.BootDiagStorageAccount.primary_connection_string
  https_only        = true

  resource_types {
    service   = true
    container = true
    object    = true
  }

  services {
    blob  = true
    queue = true
    table = true
    file  = true
  }

  start  = local.sas_begintime
  expiry = local.sas_endtime

  permissions {
    read    = true
    write   = true
    delete  = true
    list    = true
    add     = true
    create  = true
    update  = true
    process = true
  }
}

resource "azurerm_virtual_machine_extension" "inguestdiagnostics" {
  for_each = contains(keys(local.env), "InGuestDiagnostics") ? local.env["InGuestDiagnostics"] : {}
  depends_on = [azurerm_virtual_machine_extension.dependencyagent]

  name                       = each.value["name"]
  location                   = azurerm_resource_group.resourcegroup.location
  resource_group_name        = azurerm_resource_group.resourcegroup.name
  virtual_machine_name       = azurerm_virtual_machine.compute["${each.key}"].name
  publisher                  = each.value["publisher"]
  type                       = each.value["type"]
  type_handler_version       = each.value["type_handler_version"]
  auto_upgrade_minor_version = each.value["auto_upgrade_minor_version"]

  settings           = templatefile("${path.module}/templates/ladcfg2json.tmpl", { vmid = azurerm_virtual_machine.compute["${each.key}"].id, storageAccountName = data.azurerm_storage_account.BootDiagStorageAccount.name })
  protected_settings = <<PROTECTEDSETTINGS
     {
       "storageAccountName": "${data.azurerm_storage_account.BootDiagStorageAccount.name}",
       "storageAccountSasToken": "${replace(data.azurerm_storage_account_sas.inguestdiagnostics.0.sas, "/^\\?/", "")}"
     }
 PROTECTEDSETTINGS
}
# These variations didn't work for me ..
# "ladCfg": "${templatefile("${path.module}/templates/ladcfgjson.tmpl", { vmid = azurerm_virtual_machine.compute["${each.key}"].id, storageAccountName = data.azurerm_storage_account.BootDiagStorageAccount.name })}",
# - This one get's you Error: "settings" contains an invalid JSON: invalid character '\n' in string literal or Error: "settings" contains an invalid JSON: invalid character 'S' after object key:value pair

# "ladCfg": "${replace(data.local_file.ladcfgjson["${each.key}"].content, "/\\n/", "")}",
# - This one get's you Error: "settings" contains an invalid JSON: invalid character 'S' after object key:value pair

特夫瓦尔斯

workspace = {
  TerraformWorkSpaceName = {
    compute = {
      # Add additional key/objects for additional Compute
      computer01 = {
        name       = "computer01"
      }
    }
    InGuestDiagnostics = {
      # Add additional key/objects for each Compute you want to install the InGuestDiagnostics on
      computer01 = {
        name                       = "LinuxDiagnostic"
        publisher                  = "Microsoft.Azure.Diagnostics"
        type                       = "LinuxDiagnostic"
        type_handler_version       = "3.0"
        auto_upgrade_minor_version = "true"
      }
    }
  }
}

如果不将整个内容 Package 在jsonencode.ladcfg2json.tmpl中,就无法使模板文件工作

${jsonencode({
  "StorageAccount": "${storageAccountName}",
  "ladCfg": {
    "sampleRateInSeconds": 15,
    "diagnosticMonitorConfiguration": {
        "metrics": {
            "metricAggregation": [
                {
                    "scheduledTransferPeriod": "PT1M"
                },
                {
                    "scheduledTransferPeriod": "PT1H"
                }
            ],
            "resourceId": "${vmid}"
        },
        "eventVolume": "Medium",
        "performanceCounters": {
            "sinks": "",
            "performanceCounterConfiguration": [
                {
                    "counterSpecifier": "/builtin/processor/percentiowaittime",
                    "condition": "IsAggregate=TRUE",
                    "sampleRate": "PT15S",
                    "annotation": [
                        {
                            "locale": "en-us",
                            "displayName": "CPU IO wait time"
                        }
                    ],
                    "unit": "Percent",
                    "class": "processor",
                    "counter": "percentiowaittime",
                    "type": "builtin"
                }
            ]
        },
        "syslogEvents": {
            "syslogEventConfiguration": {
                "LOG_LOCAL0": "LOG_DEBUG"
            }
        }
    }
  }
})}

我希望这能帮上忙。

xxb16uws

xxb16uws2#

正如一年多前有人问过的,这是给像我这样第一次尝试这个的人的,我们只使用linux vms,所以这个建议适用于那些人:
1.受保护的设置应使用PROTECTED_SETTINGS,而不是SETTINGS(您可以在上面的@rv23答案中看到)
1.从我关注的文档www.example.com中https://learn.microsoft.com/en-gb/azure/virtual-machines/extensions/diagnostics-linux#protected-settings,您可以看到需要指定storageAccountSasToken而不是storageAccountKey:
下面是我修改过的config(用你自己的设置替换所有大写的位):

resource "azurerm_virtual_machine_extension" "vm_linux_diagnostics" {
    count = "1"

    name = "NAME"

        resource_group_name = "YOUR RESOURCE GROUP NAME"
        location            = "YOUR LOCATION"

        virtual_machine_name = "TARGET MACHINE NAME"

        publisher                  = "Microsoft.Azure.Diagnostics"
        type                       = "LinuxDiagnostic"
        type_handler_version       = "3.0"
        auto_upgrade_minor_version = "true"

        settings = <<SETTINGS
        {
            "StorageAccount": "tfnpfsnhsuk",
            "ladCfg": {
                "sampleRateInSeconds": 15,
                "diagnosticMonitorConfiguration": {
                    "metrics": {
                        "metricAggregation": [
                            {
                                "scheduledTransferPeriod": "PT1M"
                            },
                            {
                                "scheduledTransferPeriod": "PT1H"
                            }
                        ],
                        "resourceId": "VM ID"
                    },
                    "eventVolume": "Medium",
                    "performanceCounters": {
                        "sinks": "",
                        .... MORE METRICS - THAT YOU REQUIRE
            }
            }
        }
        SETTINGS

        protected_settings = <<PROTECTED_SETTINGS
        {
            "storageAccountName": "YOUR_ACCOUNT_NAME",
            "storageAccountSasToken": "YOUR SAS TOKEN"
        }
        PROTECTED_SETTINGS

        tags = "YOUR TAG"
        }
6g8kf2rb

6g8kf2rb3#

我刚得到一个类似的问题:
Trying to add LinuxDiagnostic Azure VM Extension through terraform and getting errors
这包括获取SAS令牌和阅读json文件。

bwntbbo3

bwntbbo34#

我已经尝试使用下面的代码为Windows VM添加诊断日志记录。当我运行管道时,我得到错误消息:““VM报告在处理扩展”AzurePolicyforWindows“时出现故障。错误消息:“无法读取配置。"\r\n\r\nM”。对于此处可能遗漏的内容,您有何建议?
这里是main.tf

resource "azurerm_virtual_machine_extension" "InGuestDiagnostics" 
    {
      name               = "AzurePolicyforWindows"
      virtual_machine_id = "/subscriptions/xxxxxx/resourceGroups/xxxxxx/providers/Microsoft.Compute/virtualMachines/xxxbox0"
      publisher                  = "Microsoft.Azure.Diagnostics"
      type                       = "IaaSDiagnostics"
      type_handler_version       = "1.6"
      auto_upgrade_minor_version = "true"
      
      settings = templatefile(format("%s/diagnostics.json", path.module), {
      resource_id  = "/subscriptions/xxxxxx/resourceGroups/xxxxx/providers/Microsoft.Compute/virtualMachines/xxxbox0"
      storage_name = "xxxxxxauditlog"
      }
      )
    }

下面是诊断. json

{
    "StorageAccount": "${storage_name}",
    "WadCfg": {
      "DiagnosticMonitorConfiguration": {
        "DiagnosticInfrastructureLogs": {
          "scheduledTransferLogLevelFilter": "Error",
          "scheduledTransferPeriod": "PT1M"
        },
        "PerformanceCounters": {
            "PerformanceCounterConfiguration": [
              {
                "counterSpecifier": "\\Processor Information(_Total)\\% Processor Time",
                "sampleRate": "PT60S",
                "unit": "Percent"
              },            
              {
                "counterSpecifier": "\\Network Interface(*)\\Packets Received Errors",
                "sampleRate": "PT60S",
                "unit": "Count"
              }
            ],
            "scheduledTransferPeriod": "PT1M"
          },
        "WindowsEventLog": {
            "DataSource": [
              {
                "name": "Application!*[System[(Level = 1 or Level = 2 or Level = 3)]]"
              },
              {
                "name": "Security!*[System[band(Keywords,4503599627370496)]]"
              },
              {
                "name": "System!*[System[(Level = 1 or Level = 2 or Level = 3)]]"
              }
            ],
            "scheduledTransferPeriod": "PT1M"
          },
        "overallQuotaInMB": 5120
      }
    }
}

相关问题