powershell 将多个Excel工作簿(每个工作簿都包含多个工作表)转换为CSV:如何提取由公式构造的URL?

uqcuzwp8  于 2023-05-22  发布在  Shell
关注(0)|答案(1)|浏览(196)

我正在编写一个PowerShell脚本,用于将多工作表Excelxlsx工作簿的集合转换为单个csv文件。我想抓取的内容之一是由HYPERLINK公式创建的超链接的计算文本。例如,单元格包含=HYPERLINK(CONCATENATE("http://foo/bar.aspx?pid=",A2),"Click Here")
我可以用$currentCell = $sheet.Cells.Item($r, $c)抓取单元格。我可以使用$currentCell.Text抓取链接文本Click Here我可以通过测试$currentCell.HasFormula来检测单元格是否有公式。我可以使用$currentCell.Formula获取公式,并使用正则表达式解析它,以检测它是否包含HYPERLINK公式。但是,我想要的是获得执行公式的结果。我可以使用$currentCell.Calculate()执行公式,但我不知道如何获得结果(当我将$currentCell.Calculate()的结果赋给一个变量时,该变量最终成为System.DBNull)。
如何以编程方式获取单元格的Calculate方法的结果?

更新

在思考了伯努瓦·迈耶的回答之后,我意识到我不明白我自己的问题的基础。我试图概括包含公式的单元格的处理,但这不起作用。单元格的公式正在计算,即,当我提取单元格的文本(具有HYPERLINK和CONCATENATE公式的单元格)时,我得到Click Here,这是执行公式的结果(例如:=HYPERLINK(CONCATENATE("http://foo/bar.aspx?pid=",A2),"Click Here"))。我需要检测和解析HYPERLINK和CONCATENATE公式,并使用Benoît描述的方法。
这是我的代码。它转换多个Excel工作簿,每个工作簿都有多个工作表,并提取我需要处理的工作表中特定公式的结果。请参见第136和145行之后的代码。

代码。于5/7更新了错误修复和代码,以检测和提取特定公式中的数据

cls

#region Functions

Function Remove-WhiteSpaceFromNonQuoted($inString)
{

    $quoted = $false
    $newString = ""

    for ($i = 0; $i -lt $inString.Length; $i++)
    {
        if ($inString[$i] -eq "`"")
        {
            $quoted = $quoted -xor $true
        }

        if (($inString[$i] -match "\S" -and !$quoted) -or ($quoted))
        {
            $newString = $newString + $inString[$i] 
        }
    }

    return $newString
}

#endregion

$sortedFieldNameList = New-Object -TypeName System.Collections.SortedList

$fqBookNames = New-Object -TypeName System.Collections.SortedList

$fqBookNames.Add("C:\foo\bar1.xlsx", "")
$fqBookNames.Add("C:\foo\bar2.xlsx", "")
$fqBookNames.Add("C:\foo\barN.xlsx", "")

$global:workBook = $null
$global:excel =  $null

try
{   

    $global:excel = New-Object -Com Excel.Application
    $global:excel.Visible = $false

    write-host ("Scan for column names")

    #Scan all sheets in all books and create an object with all the column names encountered 
    foreach ($fqBookName in $fqBookNames.Keys)
    {
        $global:workBook = $global:excel.Workbooks.Open($fqBookName)

        foreach ($sheet in $global:workBook.Sheets)
        {
            $columnIndexMax = $sheet.UsedRange.Column + $sheet.UsedRange.Columns.Count - 1
            write-host ("Workbook=" + $global:workBook.Name + ". Sheet=" + $sheet.Name)
            $rowOne = $sheet.Rows(1)

            for ($columnIndex = 1; $columnIndex -le $columnIndexMax; $columnIndex++)
            {
                $columnName = $rowOne.Cells($columnIndex).Text.Trim().ToUpper()

                if ($columnName.Length -gt 0)
                {
                    if (!$sortedFieldNameList.ContainsKey($columnName)) 
                    {
                        $sortedFieldNameList.Add($columnName, "")
                    }
                }
                else
                {
                    break
                }
            }
        }

        $global:workBook.Close($false)
        Clear-Variable workBook
    }

    #Create a class that represents the worst-case collection of columns that will be output to, e.g., a grid or CSV file
    #https://stackoverflow.com/questions/49117127/create-a-class-with-dynamic-property-names-in-powershell
    Invoke-Expression @"
    Class ClsExportCsv {
    $(($sortedFieldNameList.Keys).ForEach({"[string] `${$($_)}`n "}))
    }
"@

    #create array to hold list of rows that will be output to, e.g., a grid or CSV file
    $itemList = New-Object System.Collections.ArrayList
    $itemList.clear()

    write-host ("Scan for data")

    foreach ($fqBookName in $fqBookNames.Keys)
    {
        $global:workBook = $global:excel.Workbooks.Open($fqBookName)

        foreach ($sheet in $global:workBook.Sheets)
        {
            write-host -NoNewline ("Workbook=" + $global:workBook.Name + ". Sheet=" + $sheet.Name + ". Rows=")

            $columnNameLookup = @{}
            $columnNameLookup.Clear()

            $columnIndexMax = $sheet.UsedRange.Column + $sheet.UsedRange.Columns.Count - 1
            $rowOne = $sheet.Rows(1)

            #create column name index lookup table for this sheet
            for ($columnIndex = 1; $columnIndex -le $columnIndexMax; $columnIndex++)
            {
                $columnNameLookup.Add($columnIndex, $rowOne.Cells($columnIndex).Text.Trim().ToUpper())
            }

            for ($rowIndex = 2; $rowIndex -le $sheet.Cells.EntireRow.Count; $rowIndex++)
            {
                $rowCurrent = $sheet.Rows($rowIndex)

                if (($rowCurrent.Cells(1).Text).Length -gt 0)
                {

                    $listRow = New-Object -TypeName ClsExportCsv

                    for ($columnIndex = 1; $columnIndex -le $columnIndexMax; $columnIndex++)
                    {
                        if (($columnNameLookup.$columnIndex).Length -gt 0)
                        {

                            $cellObject = $rowCurrent.Cells($columnIndex)
                            $textFromFormula = ""

                            if ($cellObject.HasFormula)
                            {
                                $formulaNoWhiteSpace = Remove-WhiteSpaceFromNonQuoted -inString $cellObject.Formula

                                #detect and parse cells with =HYPERLINK(CONCATENATE("http://xxxx.aspx?pid=",A2),"Click Here")
                                if ($formulaNoWhiteSpace -match '^(?:\=HYPERLINK\(CONCATENATE\(\")(?<URL>.*)(?:\"\,)(?<A1>.*)(?:\)\,.*)$')
                                {
                                    if (($Matches["URL"] -ne $null) -and ($Matches["A1"] -ne $null))
                                    {
                                        $textFromFormula = ($Matches["URL"] + $sheet.Range($Matches["A1"]).Text) 
                                    }
                                }

                                #detect and parse cells with =HYPERLINK("http://xxxx","Click Here")
                                if ($formulaNoWhiteSpace -match '^(?:\=HYPERLINK\(\")(?<URL>.*)(?:\"\,\".*\"\))$')
                                {
                                    if ($Matches["URL"] -ne $null)
                                    {
                                        $textFromFormula = $Matches["URL"] 
                                    }
                                }
                            }

                            if ($textFromFormula.Length -eq 0)
                            {
                                $listRow.($columnNameLookup.$columnIndex) = $rowCurrent.Cells($columnIndex).Text.Trim()
                            }
                            else
                            {
                                $listRow.($columnNameLookup.$columnIndex) = $textFromFormula
                            }

                        } # if (($columnNameLookup.$columnIndex).Length -gt 0)

                    } # for ($columnIndex = 1; ...

                    $itemList.Add($listRow) | out-null
                }
                else
                {
                    write-host ($rowIndex - 2).ToString()
                    break
                }

            } # for ($rowIndex = 2; .....

        } # foreach ($sheet in $global:workBook.Sheets)

        $global:workBook.Close($false)
        Clear-Variable workBook
    }

    $global:excel.Quit()
    Clear-Variable excel

    $itemList | Export-CSV -LiteralPath "C:\Users\foo\combined.csv" -NoTypeInformation -Encoding UTF8 -Delimiter ',' $itemList | Out-GridView -Title "Rows"

}
finally
{

    if ($global:excel -ne $null)
    {
        if ($global:workBook -ne $null)
        {
            $global:workBook.Close($false)
        }

        $global:excel.Quit()
        Clear-Variable excel
    }
}
unftdfkk

unftdfkk1#

似乎不可能直接获取Concatenate函数生成的地址,例如,请参阅Extract URL From Excel Hyperlink Formula
为什么像下面这样使用正则表达式的解决方案是不合适的?

$split = $currentCell.Formula -split 'CONCATENATE' | Select -Last 1 | %{$_ -replace `
'[" ()]','' -split ','}
$calculatedResult = $split[0] + $sheet.Range("$($split[1])").Text

相关问题