我想从下面的输入中每隔一行合并一次。这是输入。
ALPHA-FETOPROTEIN ROUTINE CH 0203 001 02/03/2023@10:45 LIVERF3
###-##-#### #######,#### In lab
ALPHA-FETOPROTEIN ROUTINE CH 0203 234 02/03/2023@11:05 LIVER
###-##-#### ########,######## In lab
ANION GAP STAT CH 0203 124 02/03/2023@11:06 DAY
###-##-#### ######,##### #### In lab
BASIC METABOLIC PANE ROUTINE CH 0203 001 02/03/2023@10:45 LIVERF3
###-##-#### #######,#### ###### In lab
这是所需的输出
ALPHA-FETOPROTEIN ROUTINE CH 0203 001 02/03/2023@10:45 LIVERF3 ###-##-#### #######,#### In lab
ALPHA-FETOPROTEIN ROUTINE CH 0203 234 02/03/2023@11:05 LIVER ###-##-#### ########,######## In lab
ANION GAP STAT CH 0203 124 02/03/2023@11:06 DAY ###-##-#### ######,##### #### In lab
BASIC METABOLIC PANE ROUTINE CH 0203 001 02/03/2023@10:45 LIVERF3 ###-##-#### #######,#### ###### In lab
我试过的代码是
for($i = 0; $i -lt $splitLines.Count; $i += 2){
$splitLines[$i,($i+1)] -join ' '
}
它来自Joining every two lines in Powershell output,但我似乎不能让它为我工作,我不太精通powershell,但我在工作中可用的东西的摆布。
编辑:下面是我按照要求使用的完整代码。
# SET VARIABLES
$inputfile = "C:\Users\Will\Desktop\testfile.txt"
$outputfile = "C:\Users\Will\Desktop\testfileformatted.txt"
$new_output = "C:\Users\Will\Desktop\new_formatted.txt"
# REMOVE EXTRA CHARACTERS
$remove_beginning_capture = "-------------------------------------------------------------------------------"
$remove_end_capture = "==============================================================================="
$remove_line = "------"
$remove_strings_with_spaces = " \d"
Get-Content $inputfile | Where-Object {$_ -notmatch $remove_beginning_capture} | Where-Object {$_ -notmatch $remove_end_capture} | Where-Object {$_ -notmatch $remove_line} | Where-Object {$_ -notmatch $remove_strings_with_spaces} | ? {$_.trim() -ne "" } | Set-Content $outputfile
# Measures line length for loop
$file_lines = gc $outputfile | Measure-Object
#Remove Whitespace
# $whitespace_removed = (Get-Content $outputfile -Raw) -replace '\s+', ' '| Set-Content -Path C:\Users\Will\Desktop\new_formatted.csv
# Combine every other line
$lines = Get-Content $outputfile -Raw
$newcontent = $lines.Replace("`n","")
Write-Host "Content: $newcontent"
$newcontent | Set-Content $new_output
for($i = 0; $i -lt $splitLines.Count; $i += 2){
$splitLines[$i,($i+1)] -join ' '
}
2条答案
按热度按时间u5i3ibmn1#
只需读取两行,然后打印一行
41zrol4v2#
PowerShell-idiomatic solutions:
Use
Get-Content
with-ReadCount 2
in order to read the lines from your file in pairs, which allows you to process each pair in aForEach-Object
call, where the constituent lines can be joined to form a single output line.The above directly outputs the resulting lines (as the
for
command in your question does), causing them to print to the display by default.Pipe to
Set-Content
to save the output to a file:Performance notes:
Get-Content
is quite slow by default - see GitHub issue #7537 , and the performance ofForEach-Object
andWhere-Object
could be improved too - see GitHub issue #10982 .ForEach-Object
cmdlet in favor of the intrinsic.ForEach()
method, and, instead of piping toSet-Content
, passes all output lines via the-Value
parameter:An better-performing alternative is to use a
switch
statement with the-File
parameter to process files line by line:Helper index variable
$i
and the modulo operation (%
) are simply used to identify which line is the start of a (new) pair, and which one is its second half.switch
statement is itself streaming, but it cannot be used as-is as pipeline input. By enclosing it in& { ... }
, it can, but that forfeits some of the performance benefits, making it only marginally faster than the optimizedGet-Content -ReadCount 2
solution:Set-Content $outFile -Value $(...)
, albeit at the expense of collecting all output lines in memory first:The fastest and most concise solution is to use a regex -based approach, which reads the entire file up front:
Note:
-replace
operation matches two consecutive lines, and joins them together with a space, ignoring leading spaces on the second line. For a detailed explanation of the regex and the ability to interact with it, see this regex101.com page .Set-Content
:Set-Content
is provided by an expression that doesn't involve for-every-input-line calls to script blocks ({ ... }
) (as theswitch
solution requires), there is virtually no slowdown resulting from use of the pipeline (whose use is generally preferable for conceptual elegance and concision).As for what you tried:
The
$splitLines
-based solution in your question is predicated on having assigned all lines of the input file to this self-chosen variable as an array, which your code does not do.While you could fill variable
$splitLines
with an array of lines from your input file with$splitLines = Get-Content yourFile.txt
, given thatGet-Content
reads text files line by line by default, theswitch
-based line-by-line solution is more efficient and streams its results (which - if saved to a file - keeps memory usage constant, which matters with large input sets (though rarely with text files)).A performance tip when reading all lines at once into an array with
Get-Content
: use-ReadCount 0
, which greatly speeds up the operation:$splitLines = Get-Content -ReadCount 0 yourFile.txt