读取csv文件在某些字段中具有空值,并且也没有完全相同的格式值?

q9yhzks0  于 2023-06-03  发布在  其他
关注(0)|答案(2)|浏览(171)

我有csv文件(ABC.CSV),它具有以下格式的数据

COLUMN1 COLUMN2 COLUMN3      COLUMN4     COLUMN5
12345   ABC     RR,MM K      NAO,KUM      DEV
34567   CDEF    NN                        INT
89567   KGH     PP, BHIM     PRKC         PROD
9876    PIM                               DEV          
6543    KCDEF   NICE,MAN K                INT  
5432    GHK                  SIN,NICE C   PROD

下面是我用来读取这个csv文件的powershell代码

$filePath = "C:\path\to\your\abc.csv"
$searchString = "9876"

# Read the content of the file
$content = Get-Content -Path $filePath

# Process each line of the file
foreach ($line in $content) {
    # Split the line into individual values
    $values = $line -split ','

    # Extract the values
    $column1 = $values[0].Trim()
    $column2 = $values[1].Trim()
    $column3 = ($values[2].Trim('"'), $values[3].Trim('"')) -join ','
    $column4 = $values[4].Trim()
    $column5 = $values[5].Trim()
    
    if ($column1 -like "9876" -and $column5 -like "PROD" {
     Write-Host $column1
     Write-Host $column2
     Write-Host $column3
     Write-Host $column4
     Write-Host $column5

    }

这段代码只有在所有字段都是正确的形状,但任何字段是空的或列4和5是有价值的只是CC不CC,KK然后它抛出错误。
此值的罚款

12345   ABC     RR,MM K      NAO,KUM      DEV

未显示这些值的正确结果

34567   CDEF    NN                        INT
    89567   KGH     PP, BHIM     PRKC         PROD
    9876    PIM                               DEV          
    6543    KCDEF   NICE,MAN K                INT  
    5432    GHK                  SIN,NICE C   PROD
r9f1avp5

r9f1avp51#

如前所述,您的数据 * 不是 * CSV,而是***固定宽度列***的形式,其边界由 * 列名 * 开始的字符位置暗示。

以下是将您的文件转换为CSV格式,并使用ConvertTo-Csv将结果解析为 * 对象-注意解决方案 * 一般*基于上述假设;它既不依赖于 * 特定的列数 *,也不依赖于它们的 * 特定长度 *:

# Read the file into the header line and all data lines.
$headerLine, $dataLines = Get-Content $filePath

# Get the indices of the characters that end the fields.
# + -1 adds an extra array element that is a placeholder for the end of the line.
$fieldEndIndices = [regex]::Matches($headerLine, ' \S').Index + -1

# Iterate over all data lines.
$objects =
  $dataLines |  
  ForEach-Object {
    # Split the line at hand into fields, trim each field and enclose it in "..."
    $pos = 0
    $fields = 
      foreach ($fieldEndIndex in $fieldEndIndices) {
        if ($fieldEndIndex -eq -1) { $fieldEndIndex = $_.Length - 1 } 
        '"' + $_.Substring($pos, $fieldEndIndex - $pos + 1).Trim() + '"'
        $pos += $fieldEndIndex - $pos + 1
      }
    # Output the fields as a CSV line
    $fields -join ','
  } |
  ConvertFrom-Csv -Header (-split $headerLine) # Parse the CSV data into objects.

在运行了上面的代码之后,$objects包含了一个[pscustomobject]示例数组,这些示例的属性是根据输入数据中的列命名的,其值是字段值。
要可视化结果,可以运行$objects | Format-Table,这会产生以下结果,显示数据已按预期进行解析:

COLUMN1 COLUMN2 COLUMN3    COLUMN4    COLUMN5
------- ------- -------    -------    -------
12345   ABC     RR,MM K    NAO,KUM    DEV
34567   CDEF    NN                    INT
89567   KGH     PP, BHIM   PRKC       PROD
9876    PIM                           DEV
6543    KCDEF   NICE,MAN K            INT
5432    GHK                SIN,NICE C PROD
zzzyeukh

zzzyeukh2#

你已经修复了数据,而不是CSV。尝试下面的正则表达式,它使用硬编码的列宽:

$filename = 'c:\temp\test.csv'
$pattern = '(?<column1>.{8})(?<column2>.{8})(?<column3>.{13})(?<column4>.{11})(?<column5>.*)'
$data = Get-Content -Path $filename | Select-Object -Skip 1 | Select-String -Pattern $pattern
$table = $data | foreach {[PSCustomObject]@{
   column1 = $_.Matches.Groups[1].Value.Trim() 
   column2 = $_.Matches.Groups[2].Value.Trim()
   column3 = $_.Matches.Groups[3].Value.Trim()
   column4 = $_.Matches.Groups[4].Value.Trim()
   column5 = $_.Matches.Groups[5].Value.Trim()
} }
$table | Format-Table

成果

column1 column2 column3    column4    column5
------- ------- -------    -------    -------
12345   ABC     RR,MM K    NAO,KUM    DEV
34567   CDEF    NN                    INT
89567   KGH     PP, BHIM   PRKC       PROD
9876    PIM                           DEV
6543    KCDEF   NICE,MAN K            INT
5432    GHK                SIN,NICE C PROD

相关问题