加入Powershell中的其他行

nvbavucw  于 2023-02-23  发布在  Shell
关注(0)|答案(2)|浏览(140)

我想从下面的输入中每隔一行合并一次。这是输入。

ALPHA-FETOPROTEIN      ROUTINE    CH 0203 001   02/03/2023@10:45 LIVERF3
     ###-##-####    #######,####        In lab
ALPHA-FETOPROTEIN      ROUTINE    CH 0203 234   02/03/2023@11:05 LIVER
     ###-##-####    ########,########   In lab
ANION GAP              STAT       CH 0203 124   02/03/2023@11:06 DAY
     ###-##-####    ######,##### ####   In lab
BASIC METABOLIC PANE   ROUTINE    CH 0203 001   02/03/2023@10:45 LIVERF3
     ###-##-####    #######,#### ###### In lab

这是所需的输出

ALPHA-FETOPROTEIN      ROUTINE    CH 0203 001   02/03/2023@10:45 LIVERF3 ###-##-####    #######,####        In lab
ALPHA-FETOPROTEIN      ROUTINE    CH 0203 234   02/03/2023@11:05 LIVER ###-##-####    ########,########   In lab
ANION GAP              STAT       CH 0203 124   02/03/2023@11:06 DAY ###-##-####    ######,##### ####   In lab
BASIC METABOLIC PANE   ROUTINE    CH 0203 001   02/03/2023@10:45 LIVERF3 ###-##-####    #######,#### ###### In lab

我试过的代码是

for($i = 0; $i -lt $splitLines.Count; $i += 2){
  $splitLines[$i,($i+1)] -join ' '
}

它来自Joining every two lines in Powershell output,但我似乎不能让它为我工作,我不太精通powershell,但我在工作中可用的东西的摆布。
编辑:下面是我按照要求使用的完整代码。

# SET VARIABLES
$inputfile = "C:\Users\Will\Desktop\testfile.txt"
$outputfile = "C:\Users\Will\Desktop\testfileformatted.txt"
$new_output = "C:\Users\Will\Desktop\new_formatted.txt"

# REMOVE EXTRA CHARACTERS
$remove_beginning_capture = "-------------------------------------------------------------------------------"
$remove_end_capture = "==============================================================================="
$remove_line = "------"
$remove_strings_with_spaces = "            \d"
Get-Content $inputfile | Where-Object {$_ -notmatch $remove_beginning_capture} | Where-Object {$_ -notmatch $remove_end_capture} | Where-Object {$_ -notmatch $remove_line} | Where-Object {$_ -notmatch $remove_strings_with_spaces}  | ? {$_.trim() -ne "" } | Set-Content $outputfile

# Measures line length for loop
$file_lines = gc $outputfile | Measure-Object

#Remove Whitespace
# $whitespace_removed = (Get-Content $outputfile -Raw) -replace '\s+', ' '| Set-Content -Path C:\Users\Will\Desktop\new_formatted.csv

# Combine every other line
$lines = Get-Content $outputfile -Raw
$newcontent = $lines.Replace("`n","")
Write-Host "Content: $newcontent"
$newcontent | Set-Content $new_output

for($i = 0; $i -lt $splitLines.Count; $i += 2){
  $splitLines[$i,($i+1)] -join ' '
}
u5i3ibmn

u5i3ibmn1#

只需读取两行,然后打印一行

$inputFilename = "c:\temp\test.txt"
$outputFilename = "c:\temp\test1.txt"

$reader = [System.IO.StreamReader]::new($inputFilename)
$writer = [System.IO.StreamWriter]::new($outputFilename)
while(($line = $reader.ReadLine()) -ne $null)
{
   $secondLine = ""
   if(!$reader.EndOfStream){ $secondLine = $reader.ReadLine() }

   $writer.WriteLine($line + $secondLine)
}
$reader.Close()
$writer.Flush()
$writer.Close()
41zrol4v

41zrol4v2#

PowerShell-idiomatic solutions:

Use Get-Content with -ReadCount 2 in order to read the lines from your file in pairs, which allows you to process each pair in a ForEach-Object call, where the constituent lines can be joined to form a single output line.

Get-Content -ReadCount 2 yourFile.txt | 
  ForEach-Object { $_[0] + ' ' +  $_[1].TrimStart() }

The above directly outputs the resulting lines (as the for command in your question does), causing them to print to the display by default.
Pipe to Set-Content to save the output to a file:

Get-Content -ReadCount 2 yourFile.txt | 
  ForEach-Object { $_[0] + ' ' +  $_[1].TrimStart() } |
  Set-Content yourOutputFile.txt

Performance notes:

  • Unfortunately (as of PowerShell 7.3.2), Get-Content is quite slow by default - see GitHub issue #7537 , and the performance of ForEach-Object and Where-Object could be improved too - see GitHub issue #10982 .
  • At the expense of collecting all inputs and outputs in memory first, you can noticeably improve the performance with the following variation, which avoids the ForEach-Objectcmdlet in favor of the intrinsic .ForEach() method, and, instead of piping to Set-Content , passes all output lines via the -Valueparameter:
Set-Content $tempOutFile -Value (
  (Get-Content -ReadCount 2 $tempInFile).ForEach({ $_[0] + ' ' + $_[1].TrimStart() })
)
  • Read on for even faster alternatives, but remember that optimizations are only worth undertaking if actually needed - if the first PowerShell-idiomatic solution above is fast enough in practice, it is worth using for its conceptual elegance and concision.
  • See this Gist for benchmarks that compare the relative performance of the solutions in this answer as well as that of the solution from jdweng's .NET API-based answer .

An better-performing alternative is to use a switch statement with the -File parameter to process files line by line:

$i = 1
switch -File yourFile.txt {
  default {
    if ($i++ % 2) { $firstLineInPair = $_ }
    else          { $firstLineInPair + ' ' + $_.TrimStart() } 
  }
}

Helper index variable $i and the modulo operation ( % ) are simply used to identify which line is the start of a (new) pair, and which one is its second half.

  • The switch statement is itself streaming, but it cannot be used as-is as pipeline input. By enclosing it in & { ... } , it can, but that forfeits some of the performance benefits, making it only marginally faster than the optimized Get-Content -ReadCount 2 solution:
& {
  $i = 1
  switch -File yourFile.txt {
    default {
      if ($i++ % 2) { $firstLineInPair = $_ }
      else          { $firstLineInPair + ' ' + $_.TrimStart() } 
    }
  } 
} | Set-Content yourOutputFile.txt
  • For the best performance when writing to a file, use Set-Content $outFile -Value $(...) , albeit at the expense of collecting all output lines in memory first:
Set-Content yourOutputFile.txt -Value $(
  $i = 1
  switch -File yourFile.txt {
    default {
      if ($i++ % 2) { $firstLineInPair = $_ }
      else          { $firstLineInPair + ' ' + $_.TrimStart() } 
    }
  } 
)

The fastest and most concise solution is to use a regex -based approach, which reads the entire file up front:

(Get-Content -Raw yourFile.txt) -replace '(.+)\r?\n(?: *)(.+\r?\n)', '$1 $2'

Note:

  • The assumption is that all lines are paired, and that the last line has a trailing newline.
  • The -replace operation matches two consecutive lines, and joins them together with a space, ignoring leading spaces on the second line. For a detailed explanation of the regex and the ability to interact with it, see this regex101.com page .
  • To save the output to a file, you can pipe directly to Set-Content :
(Get-Content -Raw yourFile.txt) -replace '(.+)\r?\n(?: *)(.+\r?\n)', '$1 $2' |
  Set-Content yourOutputFile.txt
  • In this case, because the pipeline input to Set-Content is provided by an expression that doesn't involve for-every-input-line calls to script blocks ( { ... } ) (as the switch solution requires), there is virtually no slowdown resulting from use of the pipeline (whose use is generally preferable for conceptual elegance and concision).

As for what you tried:
The $splitLines -based solution in your question is predicated on having assigned all lines of the input file to this self-chosen variable as an array, which your code does not do.
While you could fill variable $splitLines with an array of lines from your input file with $splitLines = Get-Content yourFile.txt , given that Get-Content reads text files line by line by default, the switch -based line-by-line solution is more efficient and streams its results (which - if saved to a file - keeps memory usage constant, which matters with large input sets (though rarely with text files)).
A performance tip when reading all lines at once into an array with Get-Content : use -ReadCount 0 , which greatly speeds up the operation:
$splitLines = Get-Content -ReadCount 0 yourFile.txt

相关问题