加入Powershell中的其他行

nvbavucw 于 2023-02-23 发布在 Shell

关注(0)|答案(2)|浏览(140)

我想从下面的输入中每隔一行合并一次。这是输入。

ALPHA-FETOPROTEIN      ROUTINE    CH 0203 001   02/03/2023@10:45 LIVERF3
     ###-##-####    #######,####        In lab
ALPHA-FETOPROTEIN      ROUTINE    CH 0203 234   02/03/2023@11:05 LIVER
     ###-##-####    ########,########   In lab
ANION GAP              STAT       CH 0203 124   02/03/2023@11:06 DAY
     ###-##-####    ######,##### ####   In lab
BASIC METABOLIC PANE   ROUTINE    CH 0203 001   02/03/2023@10:45 LIVERF3
     ###-##-####    #######,#### ###### In lab

这是所需的输出

ALPHA-FETOPROTEIN      ROUTINE    CH 0203 001   02/03/2023@10:45 LIVERF3 ###-##-####    #######,####        In lab
ALPHA-FETOPROTEIN      ROUTINE    CH 0203 234   02/03/2023@11:05 LIVER ###-##-####    ########,########   In lab
ANION GAP              STAT       CH 0203 124   02/03/2023@11:06 DAY ###-##-####    ######,##### ####   In lab
BASIC METABOLIC PANE   ROUTINE    CH 0203 001   02/03/2023@10:45 LIVERF3 ###-##-####    #######,#### ###### In lab

我试过的代码是

for($i = 0; $i -lt $splitLines.Count; $i += 2){
  $splitLines[$i,($i+1)] -join ' '
}

它来自Joining every two lines in Powershell output，但我似乎不能让它为我工作，我不太精通powershell，但我在工作中可用的东西的摆布。
编辑：下面是我按照要求使用的完整代码。

# SET VARIABLES
$inputfile = "C:\Users\Will\Desktop\testfile.txt"
$outputfile = "C:\Users\Will\Desktop\testfileformatted.txt"
$new_output = "C:\Users\Will\Desktop\new_formatted.txt"

# REMOVE EXTRA CHARACTERS
$remove_beginning_capture = "-------------------------------------------------------------------------------"
$remove_end_capture = "==============================================================================="
$remove_line = "------"
$remove_strings_with_spaces = "            \d"
Get-Content $inputfile | Where-Object {$_ -notmatch $remove_beginning_capture} | Where-Object {$_ -notmatch $remove_end_capture} | Where-Object {$_ -notmatch $remove_line} | Where-Object {$_ -notmatch $remove_strings_with_spaces}  | ? {$_.trim() -ne "" } | Set-Content $outputfile

# Measures line length for loop
$file_lines = gc $outputfile | Measure-Object

#Remove Whitespace
# $whitespace_removed = (Get-Content $outputfile -Raw) -replace '\s+', ' '| Set-Content -Path C:\Users\Will\Desktop\new_formatted.csv

# Combine every other line
$lines = Get-Content $outputfile -Raw
$newcontent = $lines.Replace("`n","")
Write-Host "Content: $newcontent"
$newcontent | Set-Content $new_output

for($i = 0; $i -lt $splitLines.Count; $i += 2){
  $splitLines[$i,($i+1)] -join ' '
}

powershell

来源：https://stackoverflow.com/questions/75347681/join-every-other-line-in-powershell

2条答案

按热度按时间

u5i3ibmn1#

只需读取两行，然后打印一行

$inputFilename = "c:\temp\test.txt"
$outputFilename = "c:\temp\test1.txt"

$reader = [System.IO.StreamReader]::new($inputFilename)
$writer = [System.IO.StreamWriter]::new($outputFilename)
while(($line = $reader.ReadLine()) -ne $null)
{
   $secondLine = ""
   if(!$reader.EndOfStream){ $secondLine = $reader.ReadLine() }

   $writer.WriteLine($line + $secondLine)
}
$reader.Close()
$writer.Flush()
$writer.Close()

赞(0）回复(0）举报 2023-02-23

41zrol4v2#

PowerShell-idiomatic solutions:

Use Get-Content with -ReadCount 2 in order to read the lines from your file in pairs, which allows you to process each pair in a ForEach-Object call, where the constituent lines can be joined to form a single output line.

Get-Content -ReadCount 2 yourFile.txt | 
  ForEach-Object { $_[0] + ' ' +  $_[1].TrimStart() }

The above directly outputs the resulting lines (as the for command in your question does), causing them to print to the display by default.
Pipe to Set-Content to save the output to a file:

Get-Content -ReadCount 2 yourFile.txt | 
  ForEach-Object { $_[0] + ' ' +  $_[1].TrimStart() } |
  Set-Content yourOutputFile.txt

Performance notes:

Unfortunately (as of PowerShell 7.3.2), Get-Content is quite slow by default - see GitHub issue #7537 , and the performance of ForEach-Object and Where-Object could be improved too - see GitHub issue #10982 .
At the expense of collecting all inputs and outputs in memory first, you can noticeably improve the performance with the following variation, which avoids the ForEach-Objectcmdlet in favor of the intrinsic .ForEach() method, and, instead of piping to Set-Content , passes all output lines via the -Valueparameter:

Set-Content $tempOutFile -Value (
  (Get-Content -ReadCount 2 $tempInFile).ForEach({ $_[0] + ' ' + $_[1].TrimStart() })
)

Read on for even faster alternatives, but remember that optimizations are only worth undertaking if actually needed - if the first PowerShell-idiomatic solution above is fast enough in practice, it is worth using for its conceptual elegance and concision.
See this Gist for benchmarks that compare the relative performance of the solutions in this answer as well as that of the solution from jdweng's .NET API-based answer .

An better-performing alternative is to use a switch statement with the -File parameter to process files line by line:

$i = 1
switch -File yourFile.txt {
  default {
    if ($i++ % 2) { $firstLineInPair = $_ }
    else          { $firstLineInPair + ' ' + $_.TrimStart() } 
  }
}

Helper index variable $i and the modulo operation ( % ) are simply used to identify which line is the start of a (new) pair, and which one is its second half.

The switch statement is itself streaming, but it cannot be used as-is as pipeline input. By enclosing it in & { ... } , it can, but that forfeits some of the performance benefits, making it only marginally faster than the optimized Get-Content -ReadCount 2 solution:

& {
  $i = 1
  switch -File yourFile.txt {
    default {
      if ($i++ % 2) { $firstLineInPair = $_ }
      else          { $firstLineInPair + ' ' + $_.TrimStart() } 
    }
  } 
} | Set-Content yourOutputFile.txt

For the best performance when writing to a file, use Set-Content $outFile -Value $(...) , albeit at the expense of collecting all output lines in memory first:

Set-Content yourOutputFile.txt -Value $(
  $i = 1
  switch -File yourFile.txt {
    default {
      if ($i++ % 2) { $firstLineInPair = $_ }
      else          { $firstLineInPair + ' ' + $_.TrimStart() } 
    }
  } 
)

The fastest and most concise solution is to use a regex -based approach, which reads the entire file up front:

(Get-Content -Raw yourFile.txt) -replace '(.+)\r?\n(?: *)(.+\r?\n)', '$1 $2'

Note:

The assumption is that all lines are paired, and that the last line has a trailing newline.
The -replace operation matches two consecutive lines, and joins them together with a space, ignoring leading spaces on the second line. For a detailed explanation of the regex and the ability to interact with it, see this regex101.com page .
To save the output to a file, you can pipe directly to Set-Content :

(Get-Content -Raw yourFile.txt) -replace '(.+)\r?\n(?: *)(.+\r?\n)', '$1 $2' |
  Set-Content yourOutputFile.txt

In this case, because the pipeline input to Set-Content is provided by an expression that doesn't involve for-every-input-line calls to script blocks ( { ... } ) (as the switch solution requires), there is virtually no slowdown resulting from use of the pipeline (whose use is generally preferable for conceptual elegance and concision).

As for what you tried:
The $splitLines -based solution in your question is predicated on having assigned all lines of the input file to this self-chosen variable as an array, which your code does not do.
While you could fill variable $splitLines with an array of lines from your input file with $splitLines = Get-Content yourFile.txt , given that Get-Content reads text files line by line by default, the switch -based line-by-line solution is more efficient and streams its results (which - if saved to a file - keeps memory usage constant, which matters with large input sets (though rarely with text files)).
A performance tip when reading all lines at once into an array with Get-Content : use -ReadCount 0 , which greatly speeds up the operation:
$splitLines = Get-Content -ReadCount 0 yourFile.txt

赞(0）回复(0）举报 2023-02-23

我来回答

加入Powershell中的其他行

2条答案

相关问题

热门标签

最新问答